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(57) Abstract: EKG sensors 
((150) are placed on a patient 
(140) to receive electrocardiogram 
(EKG) recording signals, which are 
typically combinations of original 
signals from different sources, such 
as pacemaker signals, QRS complex 
signals, and irregular oscillatory 
signals that suggest an arrhythmia 
condition. A computing module 
(120) uses independent component 
analysis to separate the recorded 
EKG signals- The separated signals 
are displayed to help physicians 
to analyze heart conditions and 
to identify probably locations of 
abnormal heart conditions. At least 
a portion of the separated signals 
can be further displayed in a chaos 
phase space portrait to help detect 
abnormality in heart conditions. 
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SYSTEM AND METHOD FOR SEPARATING CARDIAC SIGNALS 
Background of the Invention 

Field of the Invgntipn 

The present invention relates to medical devices for recording cardiac signals and 
5 separating the recorded cardiac signals. 
Description of the Related Art 

Electrocardiogram (EKG) recording is a valuable tool for physicians to study patient heart 
conditions. In a typical 12-lead arrangement, up to 12 sensors are placed on a subject's chest or 
abdomen and limbs to record the electric signals from the beating heart. Each sensor, along with a 
10 reference electrode, form a separate channel that produces an individual signal. The signals from 
the different sensors are recorded on an EKG machine as different channels. The sensors are 
usually unipolar or bipolar electrodes or other devices suitable for measuring the electrical potential 
on the surface of a human body. Since different parts of the heart, such as the atria and ventricles, 
produce different spatial and temporal pattems of electrical activity on the body surface, the signals 
15 recorded on the EKG machine are usefixl for analyzing how well individual parts of the heart are 
functioning. 

A typical heartbeat signal has several well-characterized components. The first component 
is a small hump in the beginning of a heartbeat called the "P-Wave". This signal is produced by the 
right and left atria. There is a flat area after the P-Wave which is part of what is called the PR 

20 Interval. During the PR interval the electrical signal is traveling through the atrio-ventricular node 
(AV) node. The next large spike in the heartbeat signal is called the "QRS Complex." The QRS 
Complex is tall, spikey signal produced by the ventricles. Following the QRS complex is another 
smaller bump in the signal called the "T-Wave," which represents the electrical resetting of the 
ventricles in preparation for the next signal. When the heart beats continuously, the P-QRS-T waves 

25 repeat over and over. 

Many publications have described studying cardiac signals and detecting abnormal heart 
conditions. Sample pubhcations include U.S. Patent Publication No. 20020052557; Podrid & 
Kowey, Cardiac Arrhythmia: Mechanisms, Diagnosis, and Management Lippincott Williams & 
Wilkins Publishers (2nd edition, August 15, 2001); Marriott & Conover, Advanced Concepts in 

30 Arrhythmias . Mosby Inc. (3nd edition, January 15, 1998); and Josephson, M.E., Clinical Cardiac 
Electrophysiology: Techniques and Interpretations , Lippincott Williams & Wilkins Publishers; 
ISBN (3rd edition, December 15, 2001). 

Unfortunately, although EKG signals have been studied for decades, they are difficult to 
assess because EKG signals recorded at the surface are mixtures of signals from multiple sources. 

35 Typically, it is relatively straightforward to measure the shape of the QRS complex since this signal 
is so strong. However, irregular shaped P-wave or T-wave signals, along with weak irregular 
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oscillatory signals that suggest a heart arrhythmia are often masked by large pacemaker signals, or 
the strong QRS complex signals. Thus, it can be very difficult to isolate small irregular oscillatory 
signals and to identify arrhythmia conditions. 

In addition, atrial and ventricular signals are sometimes undesirably superimposed over one 
5 another. In many cases, diagnosis of disease states requires these signals to be separated from one 
another. For example, it might be desirable to separate P wave signals from QRS complex signals, 
so that signals originating in an atrium are isolated from signals representing concurrent activities in 
the ventricle. 

In some practices the EKG signals are electronically "filtered" by excluding signals of 

10 certain frequencies. The signals are also "averaged" to remove largely random or asynchoronous 
data, which is assumed to the meaningless "noise." The filtering and averaging methods 
irreversibly eliminate portions of the recorded signals. In addition, it is not proven whether the 
more random data is truly "noise" and truly meaningless. It might be that the signals that are 
removed are indicative of a disease state in a patient. Another method as disclosed in U.S. Patent 

15 No. 6,308,094 entitled "System for prediction of cardiac arrhythmias" uses Karhunen Loeve 
Transformation to decompose or compress cardiac signals into elements that are deemed 
"significant." As a result the information that are deemed "insignificant" are lost. 

Compared to other signal separation applications, separating EKG recording signals 
presents additional challenges. For example, the sources are not always stationary since the heart 

20 chambers contract and expand during beating. Additionally, the activity of a single chamber may 
be mistaken for multiple sources because of the presence of moving waves of electrical activity 
across the heart. If electrodes are not securely attached to the patient, or if the patient moves (for 
example older patients may suffer from uncontrolled jittering), the movement of the electrodes also 
undesirably generates signals. In addition, multiple signals can be sensed by the EKG which are 

25 unrelated to the cardiac signature, such as myopotentials, i.e., electrical signals from muscles other 
than the heart. 

There has been disclosure of cardiac rhythm management systems that store of list of 
triggers. U.S. Patent No. 6,400,982 entitled "Cardiac rhythm management system with arrhythmia 
prediction and prevention" discloses such a system. If a trigger matches detected cardiac signals 
30 from a patient, the system calculates the probability of arrhythmia and activates a prevention 
therapy to the patient. However the cardiac signals are in fact mixtures of signals from multiple 
sources, and the signals that are important for arrhythmia detection can be masked by other signals. 
It is therefore desirable to separate the cardiac signals used in the cardiac rhythm management 
systems. 

35 Independent component analysis (ICA) is a technique for separating mixed source signals 

(components) which are presumably independent from each other. In its simplified form, 
independent component analysis operates a "un-mixing" matrix of weights on the mixed signals, for 
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example multiplying the matrix with the mixed signals, to produce separated signals. The weights 
are assigned initial values, and then adjusted to minimize information redundancy in the separated 
signals. Because this technique does not require information on the source of each signal, it is 
known as a "blind source separation" method. Blind separation problems refer to the idea of 
5 separating mixed signals that come from multiple independent sources. Although there are many 
ICA techniques currently known, most have evolved from the original work described in U.S. 
Patent No. 5,706,402 issued on January 6, 1998. Additional references of ICA and blind source 
separation can be found in, for example, A. J. Bell and TJ Sejnowsld, Neural Computation 7: 11 29- 
1159 (1995)); Te-Won Lee, Independent Compone nt Analysis: Theory and Applications . Kluwer 

10 Academic Publishers, Boston, September 1998, Hyvarinen et al., Independent Component Analvsis . 
1st edition (Wiley-Interscience, May 18, 2001); Mark Girolami, Self-Organizing Neural Networks: 
Independent Component Analysis and Blind Source Separation (Perspectives in Neural Computing) 
(Springer Verlag, September 1999); and Mark Girolami (Editor), Advances in Independent 
Component Analysis (Perspectives in Neural Computing) (Springer Verlag August 2000). Single 

15 value decomposition algorithms have been disclosed in Adaptive Filter Theory by Simon Haykin 
(Third Edition, Prentice-Hall (NJ), (1996). 

There has been suggestion to use chaos theory to analyze cardiac signals to detect abnormal 
heart conditions. Sample disclosures include U.S. Patent Nos. 5,439,004, 5,342,401, 5,447,520 and 
5,456,690; PCT appHcation Nos. WO02/34123 and WO0224276; Smith et al. Electrical Altemans 

20 and Cardiac Electrical Instability. Circulation, Vol. 77, No. 1, pp. 110-121 (January 1988). Other 
approaches are disclosed in U.S. Patent No. 5,447,520 issued to Spano, et al. and U.S. Patent No. 
5,201,321 issued to Fulton. Chaos theory is defined as the study of complex nonlinear dynamic 
systems. Complex implies just that, nonlinear implies recursion and higher mathematical 
algorithms, and dynamic implies non-constant and non-periodic. Thus chaos theory is, very 

25 generally, the study of changing complex systems based on mathematical concepts of recursion, 
whether in the form of a recursive process or a set of differential equations modeling a physical 
system. 

When a bounded chaotic system has some kind of long-term pattern, but the pattern is not a 
simple periodic oscillation or orbit, then the system has a "Strange Attractor". If the system's 

30 behavior is plotted in a graph over an extended period patterns can be discovered that are not 
obvious in the short term. In addition, in these types of systems, no matter what the initial 
conditions are, usually the same pattern is found to emerge. The area for which this recurring 
pattern holds true is called the "basin of attraction" for the attractor. Chaos theory methods have 
been described in, for example, N. H. Packard, J. P. Crutchfield, J. Doyne Farmer, and R. S. Shaw, 

35 Geometry pf a Timg Series, Physical Review Letters, 47 (1980), p. 712; F. Takens, Detecting 
Strange Attractors in Turbulence in Lecture Notes in Mathematics 898, D. A. Rand and L. S. 
Young, eds., (Berlin: Springer- Verlag, 1981), p. 336; and J. P. Crutchfield, J. Doyne Farmer, N. H. 
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Packard, and R. S. Shaw, On Determining the Dimension of Chaotic Flows . Physica 3D, (1981), 
pp. 605-17. 

For all of these reasons, what is needed in the art is a system that can accurately separate 
medical signals from one another in order to diagnose disease states. 
5 Symmary of thg Invention 

The present application discloses systems and methods for using independent component 
analysis to determine the existence and location of anomalies such as arrhythmias of a heart. The 
disclosed systems and methods can be applied to suggest the location of atrial fibrillation, and to 
locate arrhythmogenic regions of a chamber of the heart using heart cycle signals measured from a 

10 body surface of the patient. Non-invasive localization of the ectopic origin allows focal treatment to 
be quickly targeted to effectively inhibit these complex arrhythmias without having to rely on 
widespread and time consuming sequential searches or on massively invasive simultaneous 
intracardiac sensor technique. The effective localization of these complex arrhj^thmias can be 
significantly enhanced by using independent component analysis to separate superimposed heart 

15 cycle signals originating from differing chambers or regions of the heart tissue. In addition, the 
signals that are separated by ICA are preferably also analyzed by plotting them on a chaos phase 
space portrait. 

One aspect of the invention relates to a medical system for separating cardiac signals. This 
aspect includes a receiving module to receive recorded cardiac signals from medical sensors, a 

20 computing module to separate the received signals using independent component analysis to 
produce separated signals, and a display module to display the separated signals. 

Another aspect of the invention relates to a method of detecting arrhythmia in a patient. 
The method includes placing EKG sensors on a patient to produce recorded EKG signals, sending 
the recorded signals to a computing module to separate the recorded signals into separated signals 

25 using independent component analysis, and reviewing a display of the separated signals to 
determine the existence of arrhythmia in the patient. In a preferred embodiment, each component 
of separated signals corresponds to a chaimel of recorded signals and its sensor location, therefore 
when the one or more components of separated signals that suggest arrhythmia are detected, the 
corresponding one or more sensor locations also suggest the location of arrhythmia. 

30 Yet another aspect of the invention relates to a cardiac rhythm management system. The 

system includes a cardiac signal recording module to record cardiac signals of a patient, a 
computing module to separate the recorded signals into separated signals using independent 
component analysis, and a detection module to detect or to predict an abnormal condition based on 
analyzing the separated signals. The system also includes a treatment module to treat the patient or 

35 a warning module to issue a warning when the abnormal condition is detected or predicted. 

Other aspects and embodiments of the invention are described below in the detailed 
description section or defined by the claims. 
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Brief Description of the Drawings 
FIGURE 1 is a diagram of a EKG system according to one embodiment of the invention. 
FIGURE 2 is a flowchart illustrating one embodiment of a process for separating cardiac 

signals. 

5 FIGURE 3 A is a sample chart of recorded EKG signals. 

FIGURE 3B is a sample chart of separated EKG signals. 

FIGURE 3C is a sample chart of one component of separated signals back projected on the 
recorded signals. 

FIGURE 4A is a chaos phase space portrait of three components of separated EKG signals 
10 of a healthy subject. 

FIGURE 4B is a chaos phase space portrait of three components of separated EKG signals 
of a subject with an abnormal heart condition. 

Detailed Description of the Preferred Embodiment 
Embodiments of the invention relate to a system and method for accurately separating 
15 medical signals in order to determine disease states in a patient. In one embodiment, the system 
analyzes EKG signals in order to determine whether a patient has a heart ailment or irregularity. As 
discussed in detail below, embodiments of the system utilize the techniques of independent 
component analysis to separate the medical signals from one another. 

In addition to the signal separation technique, embodiments of the invention also relate to 
20 systems and methods that first separate signals using ICA, and then perform an analysis on a 
specific isolated signal, or set of isolated signals, using a "chaos" analysis. As described earlier, 
Chaos theory (also called nonlinear dynamics) studies patterns that are not completely random, but 
cannot be determined by simple formulas. Because cardiac signals are typically non-random, but 
cannot be easily described by a simple formula. Chaos theory analysis as described below provides 
25 an effective tool to analyze these signals and determine disease states. 

Accordingly, once the signals are separated using ICA, they can be plotted to produce a 
chaos phase space portrait. By reviewing the patterns in the phase space portrait, for example 
reviewing the existence and location of one or more attractors, or comparing established health 
patterns and established abnormal patterns with the patterns of the patient, a user is able to assess 
30 the likelihood of abnormality in the signals, which indicate disease conditions in the patient. 

FIGURE 1 is a diagram of an EKG system that includes a computing module for signal 
separation according to one embodiment of the present invention. As shown in FIGURE 1, 
electrode sensors 150 are placed on the chest and limb of a patient 140 to record electric signals. 
The electrodes send the recorded signals to a receiving module 110 of the EKG system 100. After 
35 optionally performing signal amplification, analog-to-digital conversion or both, the receiving 
module 110 sends the received signals to a computing module 120 of the EKG system 100. The 
computing module 120 uses an independent component analysis method to separate the recorded 
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signals to produce separated signals. The independent component analysis method has been 
described in detail in the Appendix and below with respect to Figure 2. 

The computing module 120 can be implemented in hardware, software, or a combination of 
both. It can be located physically within the EKG system 100 or connected to the recorded signals 
5 received by the EKG system 100. A displaying module 130, which includes a printer or a monitor, 
displays the separated signals on paper or on screen. The displaying module 130 can be located 
within the EKG system 100 or connected to it. Optionally, the displaying module 130 also displays 
the recorded signals on paper or on screen. In one embodiment, the displaying module also 
displays some components of the separated signals in a chaos phase space portrait. 

10 In one embodiment, the EKG system 100 also includes a database (not shown) that stores 

recognized EKG signal triggers and corresponding diagnosis. The triggers refer to conditions that 
indicate the likelihood of arrhythmia. For example, triggers can include sinus beats, premature 
sinus beats, beats following long sinus pauses, long-short beat sequences, R on T-wave beats, 
ectopic ventricular beats, premature ventricular beats, and so forth. Triggers can include threshold 

15 values that indicate arrhythmia, such as threshold values of ST elevations, heart rate, increase or 
decrease in heart rate, late-potentials, abnormal autonomic activity, and so forth. A left bundle- 
branch block diagnosis can be associated with triggers such as the absence of q wave in leads I and 
V6, a QRS duration of more than 120 msec, small notching of R wave, etc. 

Triggers can be based on a patient's history, for example the percentage of abnormal beats 

20 detected during an observation period, the percentage of premature or ectopic beats detected during 
an observation period, heart rate variation during an observation period, and so forth. Triggers may 
also include, for example, the increase or decrease of ST elevation in beat rate, the increase in 
frequency of abnormal or premature beats, and so forth. 

A matching module (not shown) attempts to match the separated signals with one or more 

25 of the stored triggers. If a match is found, the matching module displays the matched 
corresponding diagnosis, or sends a warning to a healthcare worker or to the patient. Methods such 
as computer-implemented logic rules, classification trees, expert system rules, statistical or 
probability analysis, pattern recognition, database queries, artificial intelligence programs and 
others can be used to match the separated signals with stored triggers. 

30 FIGURE 2 is a flowchart illustrating one embodiment of a process for separating EKG 

signals. The process starts from a start block 202, and proceeds to a block 204, where the 
computing module 120 of the EKG system 100 receives the recorded signals Xj from the electrode 
sensors, with J being the number of channels. Prior to processing, the signals can be amplified to 
strengths suitable for computer processing. Analog-to-digital conversion of signals can also be 

35 performed. 

From the block 204, the process proceeds to a block 206, where the initial values for a "un- 
mixing" matrix of scaling weights Wy are selected. In one embodiment, the initial values for a 
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matrix of initial weights Wjo are also selected. The process then proceeds to a block 208, where a 
plurality of training signals Yj are produced by operating the matrix on the recorded signals. In a 
preferred embodiment, the training signals are produced by multiplying the matrix with the 
recorded signals such that Y, = Wy * Xj. In one embodiment, the initial weights Wjo are included 
5 such that Y, = Wy * Xj + Wjo. The process proceeds from the block 208 to a block 210, wherein the 
scaling weights Wy and optionally the initial weights Wio are adjusted to reduce the information 
redundancy among the training signals. Methods of adjusting the weights have been described in 
the Appendix. 

The process proceeds to a decision block 212, where the process determines whether the 

10 information redundancy has been reduced to a satisfactory level. The criteria for the determination 
has been described in the Appendix. If the process determines that information redimdancy among 
the training signals has been reduced to a satisfactory level, then the process proceeds to a block 
214, where the training signals are displayed as separated signals Yj, with I being the number of 
components for the separated signals. In a preferred embodiment, I, the number of components of 

15 separated signals, is equal to J, the number of channels of recorded signals. Otherwise the process 
returns from the block 212 to the block 208 to again adjust the weights. From the block 214, the 
process proceeds to an end block 216. 

For the un-mixing matrix W with the final weight values, its rows represent the time 
courses of relative strengths/activity levels (and relative polarities) of the respective separated 

20 components. Its weights give the surface topography of each component, and provide evidence for 
the components' physiological origins. For the inverse of matrix W, its columns represent the 
relative projection strengths (and relative polarities) of the respective separated components onto 
the channels of recorded signals. The back projection of the ith independent component onto the 
recorded signal channels is given by the outer product of the /th row of the separated signals matrix 

25 with the ith column of the inverse un-mixing matrix, and is in the original recorded signals. Thus 
cardiac dynamics or activities of interest accounted for by single or by multiple components can be 
obtained by projecting one or more ICA components back onto the recorded signals, X =W"* * Y, 
where Y is the matrix of separated signals, Y = W * X. 

The separated signals are determined by the ICA method to be statistically independent and 

30 are presumed to be from independent sources. Regardless of whether there is in fact some 
dependence between the separated EKG signals, test results show that the separated signals provide 
a beneficial perspective for physicians to detect and to locate the abnormal heart conditions of a 
patient. 

In a preferred embodiment, time-delay between source signals is ignored. Since the 
35 sampling frequencies of cardiac signals are in the relatively low 200-500 Hz range, the effect of 
time-delay can be neglected. 
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Improved methods of ICA can be used to speed up the signal separation process. In one 
embodiment, a generaUzed Gaussian mixture model is used to classify the recorded signals into 
mutually exclusive classes. The classification methods have been disclosed in U.S. Patent 
Application No. 09/418,099 titled "Unsupervised adaptation and classification of multiple classes 
5 and sources in blind source separation" and PCT Application No. WO0127874 titled "Unsupervised 
adaptation and classification of multi-source data using a generalized Gaussian mixture model." In 
another embodiment, the computing module 120 incorporates a priori knowledge of cardiac 
dynamics, for example supposing separated QRS components to be highly kurkotic and (ar)rythmic 
component(s) to be sub-Gaussian. ICA methods with incorporated a priori knowledge have been 

10 disclosed in T-W. Lee, M. Girolami and TJ. Sejnowski, Independent Component Analysis using an 
Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources, Neural 
Computation, 1999, Vol.1 1(2): 417-441. 

FIGURE 3 A illustrates a ten-second portion of 12 channels of signals that were gathered as 
part of an EKG recording. The horizontal axis in FIGURE 3 A represents time progression of ten 

15 seconds. The vertical axis represents channel numbers 1 to 12. The signals of FIGURE 3 A are, in 
this case, from a patient that provided a mixture of multiple signals, including QRS complex 
signals, pacemaker signals, multiple oscillatory activity signals, and noise. However, because these 
signals were all occurring simultaneously, they cannot be easily separated fi-om one another using 
conventional EKG equipment. 

20 In contrast, FIGURE 3B illustrates output signals separated fi-om the mixture signals of 

FIGURE 3A, according to one embodiment of the present invention. As above, the horizontal axis 
in FIGURE 3B represents time progression of ten seconds and the vertical axis represents the 
separated components 1 to 12. The separated signals in FIGURE 3B are displayed as components 1 
to 12 corresponding to the channels 1 to 12 in FIGURE 3 A, so that a physician can identify a 

25 separated signal as relating to its respective recorded signal's corresponding sensor location on the 
patient body. For example, in a standard 12-lead arrangement, leads II, HI and AvF represent 
signals firom the inferior region. Leads VI, V2 represent signals from the septal region. Leads V5, 
V6, 1, and a VL represent signals from the lateral heart. Right and posterior heart regions tj^^ically 
require special lead placement for recording. To better identify the location of a heart condition, 

30 more than 12 leads can be used. For example, 20, 30, 40, 50, or even hundreds of sensors can be 
placed on various portions of a patient's torso. Fewer than 12 leads can also be used. The sensors 
are preferably non-invasive sensors located on the patient's body surface, but invasive sensors can 
also be used. With separated signals each corresponding to one of the locations, a physician can 
review the signals and detect abnormalities that correspond to the respective locations. 

35 As shown in FIGURE 3B, the component #1 represents the pacemaker signals and the early 

part of QRS complex signals. The component #2 represents major portions of later parts of the 
QRS complex signals. QRS complex signals represent the depolarization of the left ventricle. The 



wo 03/003905 PCT/US02/21277 

component #10 represents atrial fibrillation (a type of arrhythmia) signals. Therefore atrial 
fibrillation is predicted to be located at the sensor location that corresponds to channel #10. 
Although components #1 and #10 contain similar frequency contents of oscillatory activity between 
heart beats, they capture activities from different spatial locations. 
5 For EKG signals, we discovered that the signals separated using ICA are usually more 

independent from each other and have less information redundancy than signals that have not been 
processed through ICA. Compared to the recorded signals, the separated signals usually better 
represent the signals from the original sources of the patient's heart. In addition to arrhythmia, the 
separated cardiac signals can also be used to help detect other heart conditions. For example, the 

10 separated signals especially the separated QRS complex signals can be used detect premature 
ventricular contraction. The separated signals especially the separated Q wave signals can be used 
to detect myocardial infarction. Separating the EKG signals, especially separating the QRS complex 
and T wave signals, can help distinguish left and right bundle branch block. 

Of course, the disclosed system and method are not limited to detecting arrhythmia, or any 

15 particular type of disease state. Embodiments of the invention include all methods of analyzing 
medical signals using ICA. For example, when a pregnant woman undergoes EKG recording, the 
heart signals from the woman and from the fetus(es) can be separated. 

The separated cardiac signals can be characterized as non-random but not easily 
deterministic, which make them suitable subjects for chaotic analysis. As mentioned above, chaos 

20 theory (also called nonlinear dynamics) studies patterns that are not completely random but cannot 
be determined by simple formulas. The separated signals can be plotted to produce a chaos phase 
space portrait. By reviewing the patterns in the phase space portrait, including the existence and 
location of one or more attractors, a user is able to assess the likelihood of abnormality in the 
signals, which indicate disease conditions in the patient. 

25 In a preferred embodiment, the QRS complex signals are separated into three different 

components, with each component representing a portion of the QRS complex. The 3 components 
are 3 data sets that are found to be temporally statistically independent using independent 
component analysis. Using the three components, a 3-dimensional phase space portrait of QRS 
complex can be displayed to show the trajectory of the three components. 

30 FIGURE 3C is a sample chart of the component #10 of separated signals (as shown in 

FIGURE 3B) back projected onto the recorded signals of FIGURE 3 A. The separated signals of 
component #10, which indicate arrhythmia, is identified by reference number 302 in FIUGRE 3C. 
The 12 channels of recorded signals are identified by reference number 304 for ease of 
identification. FIGURE 3C therefore allows direct visual comparison of a separated component 

35 against channels of recorded signals. The back projections of cardiac dynamics allow us to exam 
the amount of information accounted for by single or by multiple components in the recorded 
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signals and to confirm the components' physiological meanings suggested by the surface 
topography (the aforementioned inverse of columns of the un-mixing matrix). 

FIGURE 4 A illustrates the phase space portrait of the EKG recording of a healthy subject. 
FIGURE 4B illustrates the phase space portrait of the EKG recording of an atrial fibrillation patient. 
5 In FIGURES 4A and 4B, the x, y, and z axis represent the amplitudes of the 3 QRS components. 
The separated signals' values over time are plotted to produce the phase space portraits. In the 
healthy EKG recording of FIGURE 4A, the dense cluster 402 indicates the existence of an attractor 
that attracts the signal values to the region of the dense cluster 402. The dense cluster 402 
represents the most frequent occurrences of the signals. In the atrial fibrillation patient EKG 

10 recording of FIGURE 4B, an additional loop 404, which is not part of the dense cluster 402, is 
below the attractor and the dense cluster 402 and closer to the base plane than the dense cluster 402. 
This additional loop 404 is presumably due to the oscillatory activity in the baseline portions of the 
EKG signals. The separated component #10 signal that indicate an arrhythmia condition is 
presumably responsible for the additional loop 404. The visual pattern can be compared with the 

15 visual pattern of a health subject and manually recognized as probative of indicating an abnormal 
condition such as atrial fibrillation. 

Instead of the 3 QRS complex components as shown in FIGURE 4B, other components or 
more than 3 components can also be used to plot the chaos phase space portrait. If more than 3 
components are used, the different components can be plotted in different colors. The 3 QRS 

20 complex components of FIGURE 4B are selected because test results suggest that such a phase 
space portrait is physiological significant and functions usually well as an indication of a patient's 
heart condition. 

Although FIGURES 3A, 3B, 4A and 4B were produced using test results related to the 
detection and localization of focal atrial fibrillation, the disclosed systems and methods can be used 

25 to detect and to localize other heart conditions including focal and re-entrant arrhythmia. The 
disclosed systems and methods can also be used to detect and to localize paroxysmal atrial 
fibrillation as well as persistent and chronic atrial fibrillation. 

The disclosed methods can be used to improve existing cardioverter/defibrillators (ICD's) 
that can deliver electrical stimuli to the heart. In addition to existing ICD's and existing 

30 pacemakers, some of the existing cardiac rhythm management devices also combine the functions 
of pacemakers and ICD's. A computing module embodying the disclosed methods can be added to 
the existing systems to separate the recorded cardiac signals. The separated signals are then used 
by the cardiac rhythm management systems to detect or to predict abnormal conditions. Upon 
detection or prediction, the cardiac rhythm management system automatically treats the patient, for 

35 example by delivering pharmacologic agents, pacing the heart in a particular mode, delivering 
cardioversion/defibrillation shocks to the heart, or neural stimulation of the sympathetic or 
parasympathetic branches of the autonomic nervous system. Instead of or in addition to automatic 

10 
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treatment, the system can also issue a warning to a physician, a nurse or the patient. The warning 
can be issued in the form of an audio signal, a radio signal, and so forth. The disclosed signal 
separation methods can be used in cardiac rhythm management systems in hospitals, in patient's 
homes or nursing homes, or in ambulances. The cardiac rhythm management systems include 
5 implantable cardioverter defibrillators, pacemakers, biventricular or other multi-site coordination 
devices and other systems for diagnostic EKG processing and analysis. The cardiac rhythm 
management systems also include automatic external defibrillators and other external monitors, 
programmers and recorders. 

In one embodiment, an improved cardiac rhythm management system includes a storage 

10 module that stores the separated signals. In one arrangement, the storage module can be removed 
from the cardiac rhythm management system and connected to a computing device. In another 
arrangement, the storage module is directly coimected to a computing device without being 
removed from the cardiac rhythm management system. The computing device can provide further 
analysis of the separated signals, for example displaying a chaos phase space portrait using some of 

15 the separated signals. The computing device can also store the separated signals to provide a 
history of the patient's cardiac signals. 

The disclosed methods can also be applied to predict the occurrence of arrhythmia within a 
patient's heart. After separating recorded EKG signals into separated signals, the separated signals 
can be matched with stored triggers and diagnosis as described above. If the separated signals 

20 match stored triggers that are associated with arrhythmia, an occurrence of arrhythmia is predicted. 
In other embodiments, an arrhythmia probability is then calculated, for example based on how 
closely the separated signals match the stored triggers, based on records of how jfrequently in the 
past has the patient's separated signals matched the stored triggers, and/or based on how frequently 
in the past the patient has actually suffered arrhythmia. The calculated probability can then be used 

25 to predict when will the next arrhythmia occur for the patient. Based on statistics and clinical data, 
calculated probabilities can be associated with specified time periods within an arrhythmia will 
occur. 

In addition to EKG signals, the disclosed systems and methods can be applied to separate 
other electrical signals such as electroencephalogram signals, electromyographic signals, 

30 electrodermographic signals, and electroneurographic signals. They can be applied to separate 
other types of signals, such as sonic signals, optic signals, pressure signals, magnetic signals and 
chemical signals. The disclosed systems and methods can be applied to separate signals from 
internal sources, for example within a cardiac chamber, within a blood vessel, and so forth. The 
disclosed systems and methods can be applied to separate signals from external sources such as the 

35 skin surface or away from the body. They can also be applied to record and to separate signals 
from animal subjects. 

11 
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Although the foregoing has described certain preferred embodiments, other embodiments 
will be apparent to those of ordinary skill in the art from the disclosure herein. Additionally, other 
combinations, omissions, substitutions and modifications will be apparent to the skilled artisan in 
view of the disclosure herein. Accordingly, the present invention is not to be limited by the 
5 preferred embodiments, but is to be defined by reference to the following claims. 

The present application incorporates by reference U.S. Patent No. 5,706,402, titled "Blind 
signal processing system employing information maximization to recover unknown signals through 
unsupervised minimization of output redundancy" filed November 28, 1994 in its entirety as an 
APPENDIX as follows. 
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United States Patent No. 5,706,402 
Inventor: Anthony J. Bell 

Blind signal processing system employing information maximization to recover 
unknown signals through unsupervised minimization of output redundancy 
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BUND SIGNAL PROCESSING SYSTEM 
EMPLOYING INFORMATION 
MAXIMIZATION TO RECOVER UNKNOWN 

SIGNALS THROUGH UNSUPERVISED 
MINIMIZATION OF OUTPUT REDUNDANCY 

REFERENCE TO GOVERNMENT RIGHTS 

The U. S. Government has rights in the invention dis- 
closed and claimed herein pursuant to Office of Naval 
Research grant no, N00014-93- 1-0631. 

BACKGROUND OF THE INVENTION 

1. Reld of the Invention 

This invention reiates generally to systems for recovering 
the original unknown signals subjected to transfer through 
an unlmown multichannel system by processing the known 
output signals therefrom and relates specifically to an 
infonnation-maximizing neural network that uses unsuper- 
vised learning to recover each of a multiplicity of unknown 
source signals in a multichannel having reverberation. 

2. DescxiptioA of the Related Art 

Blind Signal Processing: In many signal j^occssing 
plications, the sample signals provided by the sensors arc 
mixtures of many unknown sources. The "separation of 
sources'* problem is to extract the original unknown signals 
from these known mixtures. Generally, tiie signal sources as 
well as their mixture characteristics are unknown. Without 
knowledge of the signal sources other tiian the general 
statistical assun^on of source independence, this signal 
processing problem is known in the art as the "blind source 
separation problem". The separation is ^nDlind" because 
nothing is known about the statistics of ^e indq>cndent 
source signals and nothing is known about the mixing 
process. 

The blind separation problem is encountered in many 
familiar forms. For instance, the wcU-known "cocktail 
party" problem refers to a simation where the unknown 
(source) signals arc sounds generated in a room and the 
known (sensor) signals are the ou^uts of several micro- 
I^ones. Each of the source signals is deUycd and attenuated 
hi some (time varying) manner during transmission from 
source to microphone, where it is then mixed with other 
independently delayed and attenuated source signals, includ- 
ing multipath versions of itself (reverberation), which are 
delayed versions arriving from different directions. 

This signal processing problem arises in many contexts 
other than the sinq>le situation where each of two unknown 
mixtures of two speaking voices reaches one of two micro- 
phones. Other examples involving many sources and many 
receivers include the separation of radio or radar signals 
sensed by an array of antennas* the separation of odors In a 
mixture by a sensor array, tiic parsing of the environment 
into separate objects by our biological visual system, and die 
separation of biomagnrtic sources by a superconducting 
quantum mterfcrencc device (SQUID) array in magnetocn- 
cephalography. Other in^xatant examples of the blind 
source separation problem include sonar array signal pro- 
cessing and signal decoding in cellular telecommunication 
systems. 

The blind source separation problem is closely related to 
me more familiar •'blind deconvolution** problem, where a 
single unknown source signal is extracted from a known 
mixed signal that includes many time-delayed versions <rf 
the source originating from unknown multipa^ distortion or 
reverberation (self-convolution). The need for blind decon- 
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volution or **blind equalization** arises in a number of 
impOTtant areas such as daU transmission, acoustic rever- 
beration cancellation, seismic dcconvolution and image res- 
toration. For instance, high-speed data transmission over a 

5 telephone communication channel relies on die use of adap- 
tive equalizadoiu which can c^jerate either in a traditional 
training mode that transmits a knawn training sequence to 
establish dcconvolution parameters or in a blind mode. 
The class of communication systems that may need blind 

10 equalization capability includes high-capacity linc-of-sitc 
digital radio (cellular tclccoimnunications). Such a channel 
suffers from anomalous propagation conditions arisuig from 
natural conditions, which can degrade digital radio perfor- 
mance by causing the transmitted signal to propagate along 

15 several paths of different electrical length (multipath 
fading). Severe multipath fading requires a blind equaliza- 
tion sdieinc to recover channel operation. 

In reflection seismology, a reflection coefficient sequence 
can be blindly extracted from the received signal, which 

^ includes echoes produced at die different reflection points of 
the unknown geophysical model The traditional Unear- 
predictive seismic dcconvolution method used to remove the 
source waveform from a seismogram ignores valuable phase 
information contained in the reflection seismogram. This 

^ limitation is overcome by using blind dcconvolution to 
process the received signal by assuming only a general 
statistical geological reflection coefficient model. 

Blind dcconvolution can also be used to recover unknown 
images fliat are blurred by transmission through unknown 
systems. 

Blind Separation Mediods: Because of the fundamental 
importance of both the blind separation and blind dcconvo- 
lution signal processing problems, practitioners have pro- 
35 posed several classes of methods for solving the jHoblems. 
The blind separation problem was first addressed in 1986 by 
Jutten and Herault ("Blind separation of sources. Part I: An 
ad^tive algorithro based on ncuromimetic architecture". 
Signal processing 24 (1991) 1-10), who disclose the HJ 
40 neural network with backward connections that can usually 
solve the simple two-element blind source separation prob- 
lem. IMsadvantagcously, the HJ network iterations may not 
converge to a proper solution in some cases, depending on 
the initial state and on the source statistics. When conver- 
45 gence is possible, the HJ network appears to ccmverge in two 
stages, the first of which quickly deooaelates the two output 
signals and the second of which more slowly provides the 
statistical independence necessary to recover the two 
unknown sources. Comon et aL C'Blmd separatioa of 
30 sources. Part 11: Problems statement**, Signal Processing 24 
(1991 ) 11-20) show that the HJ network can be viewed as 
an adaptive ^tocts% fcr cancelling higher-order cunuilants in 
the output signals, fliereby achieving some degree of statis- 
tical independence by minimizing higher-order statistics 
55 among the known sensor signals. 

Other practitioners have attempted to improve the HJ 
network to remove some of the disadvantageous features. 
For instance. Soroudiyari (**Blind separation of sources. Part 
m: Stability analysis" Signal Processing 24 (1991 ) 21-29) 
60 examines other higher-order non-linear transfoiming func- 
tions otho- than those simple first and third order functions 
proposed by Jutten et bL but concludes tiiat the higlhcx-order 
functions cannot improve implementation of the HJ net* 
work. In U.S. Pat No. S3S3464* filed on Jun. 10, 1993 as 
65 appUcation Set. No. 08/074,940 and ftiUy incorporated 
herein by this reference, U et al. describe a blmd source 
separation system based on the HJ neural network model 
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that craploys linear beamfarming to in^xrovc HJ network, 
separation performance. Also, John C. Piatt et al. 
^Networks F6r The S<yaration of Sources That Arc Super- 
imposed and Ddaycd", Advances in Neural Information 
Processing Systems^ vol. 4, Moigan-Kaufinann, San Mateo» 
1992) propose extending the original magnitude-optimizing 
HJ network to estiniate a matrix of time delays in addition 
to the HI magnitude mixing ma^. Piatt ct al. observe that 
their modified network is disadvantaged by multiple stable 
states and unpredictable convergence. 

Pierre Comon C']ndq>endent cojcponent analysis, a new 
concept?" Signal Processing 36 (1994) 287-3 14) provides a 
detailed discussion of Independent Component Analysis 
(ICA), which defines a class of closed form techniques 
useful for solving tlie blind identificattoo and deconvolution 
problems. As is known in &e ait, ICA searches fcv a 
transformation noatrix to minimize the statistical dependence 
among components of a random vector. This Is distinguished 
fromMndpal Coaq>onents Analysis (PCA), whidi searches 
for a transfonnation matrix to minimize statistical correla- 
tion among components of a random vector* a solution that 
is inadequate for the blind separation problem. Thus, PCA 
<* ftn be applied to minimize second order cross>momcDts 
among a vector of sensor signals while ICA can be applied 
to minimize sensor signal joint probabilities, which offers a 
solution to the blind separation problem. Comon suggests 
that although mutual information is an excellent measure of 
the contrast between joint probabilities, it is not practical 
because of computational complexity. Instead, Comon 
teaches the use of the fourth-order cumulant tensor (thereby 
ignoring fifth-order and higher statistics) as a preferred 
measure of contrast because the associated con^utational 
complexity increases only as the fifth power of the numba- 
of unknown signals. 

Similady, Gilles Burel (**BUnd separation of sources: A 
nonlinear neural algorithm**. Neural Networks 5 (1992) 
937-{)47) asserts that the blind source separation problem Is 
nothing more than the Independent Components Analysis 
(ICA) problem. However, Burel presses an iterative 
scheme for ICA en^)loying a back propagation neural net- 
work for blind source separation that handles non-linear 
mixtures through iterative minimization of a cost function. 
Burel* s network differs from the HJ network, which does not 
minimize any cost function. Like the HJ netwctfk, Burel* s 
system can separate the source signals in the presence 
noise without attempting noise reduction (no noise hypoth- 
eses are assumed). Also, like the HJ system, practical 
convergence is not guaranteed because of the presence of 
local minima and computational con:^>lexity. Burel* s system 
differs sharply from traditional supervised back-propagation 
Implications because his cost fimction is not defined in terms 
of difference between measured and desired outputs (the 
desired outputs arc unknown). His cost function is instead 
based on output signal statistics alone, whidi permits ''unsu- 
pervised** learning in his network. 

Blind Deconvolution Methods: The blind deconvolution 
art can be appreciated with reference to the text edited by 
Simon Haykin {Blind Deconvolution, Prentice-Halt New 
Jersey, 1994). which discusses four general classes of blind 
deconvolution techniques, including Bussgang processes, 
higher-order cumulant equalization, polyspectra and maxi- 
mum likelihood sequence estioiation. Haykin neitfaer conr 
siders nor suggests specific neural netwotk teduiques suit- 
able for application to the blind deconvolution problem. 

Blind deconvolution is an example of *Hmsiq)erVLSed'* 
learning in the sense that it learns to identify the inverse of 
an unknown linear time-Invariant system widiout any physi- 
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cal access to the system input signaL Hiis unknown system 
may t>e a nonminimum phase system having one or more 
zeroes outside the unit circle in the frequency domain. The 
blind deconvolution pr€x:es8 must identify both the magni- 

5 mde and the phase of the system transfer fiinctioa. Although 
identification of the magnitude component requires only the 
second-order statistics of the system output signal, identiil- 
cation of the phase component is more difficult because it 
requires the higher-order statistics of the ou^t signaL 
Accordingly, some form of non-linearity is needed to extract 

^ the higher-Older statistical information contained in the 
magnitude and phase components of the output signal Such 
non-linearity is useful only for unknowu source signals 
having non-Gaussian statistics. There is no solution to (he 

^2 problem when the input source signal is Gaussian- 
distributed and the channel is noimunimum-phase because 
all polyspectra of Gaussian processes of order greater than 
two are identical to zero. 

Oassical adaptive deconvolution methods are based 

2Q almost entirely on second order statistics, and thus fail to 
operate correctly for nonminimtm^phase diannels unless 
the Input source signal is accessible. This failure stems from 
the inability of second-order statistics to distinguish 
minimum-phase information from maximum-phase infor- 

25 mation of the channel A minimum phase system (having all 
zeroes within the unit circle in the frequency domain) 
exhibits a unique rclatioDship between its amplitude 
response and phase response so that second order statistics 
in the output signal are suffident to recover both amplitude 

3Q and phase information for the input signal. In a 
nonminimum-phase system, second-order statistics of die 
output signal alone are insufficient to recover phase infor- 
mation and. because the system does not exhibit a unique 
relationship between its amplitude response and phase 

35 response, blind recovery of source signal {^ase information 
is not possible without exploiting higher-order ou^ut signal 
statistics. These require some form of non-linear processing 
because linear processing is restricted to the extraction of 
second-order statistics. 

40 Bussgang techniques for blind deconvolution can be 
viewed as iterative polyspectra] techniques, where rationale 
are developed for choosing the poly^ctral <aders with 
which to work and their relative weights by subtracting a 
source signal estimate frt>m the sensor signal output. The 

45 Bussgang techniques can be understood with reference to 
Sandro Bellini (chapter 2: Bussgang Techniques For Blind 
Deconvolution and Equalization**, Blind Deconvolution^ S. 
Haykin (cd.), Prentice Hall, Englewood Cliffs, N.J., 1994), 
who characterizes the Bussgang process as a class of pro- 

so cesses having an auto-correlation function equal to the 
cross-correlation of the process with itself as it exits &t>m a 
zero-memory non-linearity. 

Polyspectral techniques for blind deconvolution lead to 
unbiased estimates of the channel phase without any infor- 

55 mation about the probability distribution of the input source 
signals. The general class of polyspectral solutions to the 
blind decorrelation problem can be understood with refer- 
ence to a second Simon Haykin texttxK>k C*Ch. 20: Blind 
Deconvolution**, Adaptive Filter Theory^ Second Ed, ^ Simon 

60 Haykin (ed.), FtenUce HaU, Englewood Cliffs, N.J., 1991) 
and to Hatzinakos et al C*Ch. 5: Blind Eqiudlzation Based 
OD Higher Order Statistics (HOS)**. Blind Deconvolution, 
Simon Haykin (ed.). Prentice Hall, Englewood Cliffs, NJ., 
1994). 

65 Thus, the approaches in the ait to tfie blind separation and 
deconvolution problems can be classified as those using 
non-linear transforming functions to s|nn off higher-order 
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statistics Outten et aL and Bellini) and those using expUdt examines this issue and shows thai "minimum entropy 

^^tion of highcrKwder cumnlanls and polyspectra coding" in a biological sensory system operates to reduce 

mavkin and HalzSakos et aL). The HJ network does not the troublesome mutual infcnnation convoncnt even at the 

reliably converge even for the simplest two-source problem expense of suboptimal syn^ol frequency distnbutiOBj^iir- 

and the fourth-order cumulant tensor awiroach does not 5 low shows that the inutual mfoinuhon cang)M 

rcliablv converse because of truncation of the cumulant dancy can be minmuzcd ma neural network by fcedaigeadi 

expansion. Tlierc is accordingly a dearly-feh need for blind neuron output back to other neuron «VM»J^^J^- 

tiiZ processing methods AM can reliably solve the blind HebWan synapses to discoura^ ''^S*^!^?^ 

Dr«essinc problem for significant numbers of source sig- This "redundancy reduction- prinaple is oflfcred to explain 

^ ""^ how unsupervised perceptual learning occurs mammals. 

Unsupervised Learning Methods: In the biological sen- S. UughUn C'A Simple Coding Jf^^f^n 

sorv system arts, practitioners have formulated neural train- Neuron's Inf«mation Capacity . Z. Namtfon^ ''S.* llv 

togoiiimaliiy criteria based on studies of biological sensory 91(M>t2) proves that the opti«l neuron of a o^- 

neurons which are known to solve blind separation and mizes infonnaUon capacaiy Orough equalization of tfie 

S^^v<iJJlonprS,lemsofn^ykinds.Tl.edSnof.«per- probabUity distribuOon for each neurri code valn« 

vised learning techniques normally used with aitiaclalneu- (minimizing the unused channel capaaty conponent of 

S^e^^n^t useM for these problems because redundancy), thereby confirming Barlow's "minimum 

t^Zl:^^ requires .c«« te .heCce signals for redunds^qr; princ^I. J. "^a^ < co^U- 

tnoSng purposes. Unsupervised learning InstMd requires tion and <>»5JC«* P^fpfion , Proe. NaiL Acad. Scl-VSA^ 

signals without access to the source signals. source sohition in neurons usiog the HJ neuron modd for 

Practitioners have proposed several rationale for unsiiper- minimizing output redundancy. ^ , ... 

viiS^kLLg in hiJo3^ sensory systems. For lnrt«iee. Becker et al. rS^:??^!^'*^?:^^^ S 

Linsto ("An AppUcation of the Ptinc^le of Maximum covers surfaces m random-dot stereograms , Nature vol 

iXr^tion Pr^ation to Linear SysSns", Advmces in „ 355, pp. 161-163 Jan. 9, 1992) P^«^*« « f^f.^'**- 

Neuml Infonnction Processing Systems 1. D. S. Tburetzky propagation neural ne^ork learning .nodd modi^^^ to 

fed.) Morean-Kauftuann. (1989) shows that his wdl-known rq)Uce the external teacher (supervised Icanung) by 

••infomax^dple (first proposed in 1987) expiaiiis why interaaUy-deiived teaching signals (unsupervised learning) 

biological sensoTsystems operate to minimize infcanalion Becker et aL use non-Unear networks to iMXimuc niutuja 

loss l^een neural laym in the presence of noise. In a late 30 information between different sets of outputs, coiitraiy to the 

work f"Local Synaptic Learning Rules Suffice to Maximize bUnd signal recovery requirement. By increasing 

Mutud Information in a Linear Netwarkf. Neural Compu. redundancy, their network discovers mvanance in sepaiate 

taUon «. (1992) 691-702) Linsker describes a two-phase groups of inpuu, which can be selected wit of information 

learning aleoriOim for maximizing the mutual information passed forward to in^rove processing efficiency, 

between two layers of a neural network: However, Unskcr 35 Thus, it is known in the neural network arts that anu- 

assumes a linear input-output transfocming ftmcUon and Hebbian mutual Interaction can be used to explain ttje 

noultivatiate Gaussian statistics for botii source signals and decorrelation or minimization of redundancy observed in 

noise coomonents. Wilh these assun^tions, Linsker shows biological vision systems. TTiis can be appreciated with 

that a local sywqitic" Onological) learning rule is sufficient reference to H. B. Barlow et al. ("Adaptauon and Decoire- 

to maximize mumallnfoimation but he neifter considers nor 40 lation in the Cortex", The Computing Neuron R. Durbin et 

suggests solutions to the more general blind processing aL (eds.). Addison-Wesley, (1989) and to Schraudo^h et aL 

oroUem <rf recovering non-Gaussian source signals in a ("Competitive Anti-Hebbian Learning of Invanance . 

non-linear transfbimiDg environment. Advances in Neural IrformaHon Processing Systems 4. J. E. 

SimoD Haykin ("Ch. 11: Self-Organiring Systems lO: Moody et al. (eds.). Morgan-Kaufmann (}991), ln fart. 
lhf«Luon.Theoretic Models", Neural Nttwori^: A Com- as practitionos have suggested that Lmsker's -urfomax pm- 

Zft^e Fe^llon. S. Haykin (ed.) MacMillan, New ciple and Barlow s "mimmum redundancy prlncpte may 

^kl994) ^^e/tiiskcr-s '-infomax" principle, which both yield the same neural network learning procedures. 

I independent of the neural network learning rule used in its Until now. howev«, non-linear visions 

^le^tation. Haykin also discusses other well-known appUcable to the bhnd signal processmg problem have been 
ninciples such as the "minimization of information loss" so unknown in ttie an. 

trindple suggested in 1988 by Piumbleyetal. and Barlow's The Blind Processing Problem: As mentioned above 

^wini^ileofminiimimiedundancy^.firstproposedin 1961, blind source »q>aration and bUnd deconvolution are related 

dtiiar^ which can be used to derive a class of unsupervised problems in signal processing. The blind source separation 

leaming rules problem can be succinctiy stated as where a set of unknown 

Jm^ Atfdc ("CouM information theory provide an eco- 35 source signals S/t) S/t), are mixed together lineaily 

loatadfteory of sensory processing?". Network 3 (1992) by an unknown matrix fA^). Nothing is known about the 

213-251 ) appUcs Shannon's information theory to the sources or the mixing process, both of whidi may be 

neural procesises seen in biological optical sensors. Atick time-varying, although tiie mixing pr^« «f ^^T^i^^ 

obs««eVSat information redundancy is useful only in noise vary slowly with respect to the source. The bhnd »P»ation 
and includes two componenU: (a) unused channel capacity *o task is to recover Ae original «>^* 

arising from suboptimal symbol frequency distribution and measured supenwsitions of them. X/t) . . . . X/t) by Mi^ng 

ft.) intersymbol redundancy or mutual information. Atick a square mamx [W^] that ts a permutation of the invasc of 

2gge«sTt%SLl n«S^ns apparentiy evolved to mini- Uie unknown matrix (A^].Tlte bUnd <teconvolution probl«n 

mb» the troublesome intersymbol redundancy (mutual can be similarly stated as where a smgle unknown Mgn-l S(t) 
information) component of redundancy rather than to mini- 65 is convolved wiUi an untaown tapped deUy-line fika 

XTverall redundancy. H. B. Baiiow ("Unsupervised A, , A^ producing the conupted measured s«n^ 

S^g". Neural Cc^uuiticn 1 (1989) 295-311) also X(t)=A(t) * S(0. where A(t) is tiie impulse response of the 
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unknown (pcriiaps slowly tiinc-varying) filter. The blind 
deconvolution task is to recover S(c) by finding and con- 
volving X(t) with a tapped delay-line filter . , . , 
having the impulse response W(t) that reverses the effect of 
the unknown filter A(t). 

There are many similarities between the two problems. In 
one. source signals are corrupted by the superposition of 
other source signals and, in the other, a single source signal 
is cofiupted by superposition of time-delaycd versions of 
itself. In both cases» unsupervised learning is required 
because no error signals are available and no training signals 
are provided. In both cases, second-order statistics alone arc 
inadequate to solve the more general problem. For instance, 
a second-order decoirelation technique such as that pro- 
posed by Barlow et al. would find uncoirelated (linearly 
independent) projections [Y,] of the input sensor signals P^] 
when atteix:^ting to scpmtc unknown source signals {S,} 
but is limited to discovering a synunetric decorrelatlon 
matrix ^t cannot reverse the effects of mixing matrix [ A^l 
if the mixing matrix is asymmetric. Similarly, second-order 
deooirdation techniques based on the autocorrelation 
fiinction, such as prediction-error filters, are phase^blind and 
do not offer sufficient information to estimate tiie phase 
characteristics of the conupting filter A(t) when applied to 
the more general blind deconvolution problem. 

Thus, both blind signal processing problems require the 
use of higher-order statistics as well as certain assumptions 
regarding source signal statistics. For the blind sq>aration 
problem, tiie sources are assumed to be sutistically inde- 
pendent and non-Gaussian. With this assumption, the prob- 
lem of learning (W J becomes the ICA problem described by 
Comon. For blind deconvolution. the cnginal signal S(t) is 
assumed to be a •*white'* process consisting of independent 
symbols. The blind deconvolution problem then becomes 
the problem of removing fi-om the measured signal X(t) any 
statistical dependencies across time that axe introduced by 
file corrupting filter A(t). This {vocess is sometimes denomi- 
nated Ae **whitcning- of X(t). 

As used herein, both the ICA procedure and the *'whit- 
ening^ of a time series are denominated '"redundancy reduc- 
tion". The first class of techniques uses some type of explicit 
estimation of cumulants and polyspectra, which can be 
appreciated with reference to Haykin and Hatzinakos et al. 
Disadvantageously, such •'brute force** techniques are com- 
putationally intensive for hi^i numbers of sources ot taps 
and may be inaccurate when cumulants higher than fourti) 
order are Ignored, as they usually must be. The second class 
of techniques uses static non-linear functions, the TiylcH^ 
series expansions of which yield higher-ordcr terms. Itera- 
tive learning rules containing such terms are expected to be 
somehow sensitive to the particular higher-order statistics 
necessary to accurate redundancy reduction. This reasoning 
is used by Comon et al. to explain the HJ artwork and by 
Bellini to explain the Bussgang dcconvolvcr. 
IMsadvantageously, there is no assurance that the particular 
higher-order statistics yielded by the (heuristically) selected 
non-linear fimction are weighted in the manner necessary for 
achieving statistical independence. Recall that the known 
approach to attempting inqnovement of the HI network is to 
test various non-Unear functions selected heuristically and 
that the original functions are not yet improved in the art. 

Accordingly, there is a need in the ait for an im]H^ved 
blind processing method, sudi as some method of rigorously 
linking a static non-linearity to a learning rule that performs 
gradient ascent in some parameter guaranteed to be usefully 
related to statistical dependency. Until now, this was 
believed to be practically impossible because of the infinite 
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number of higher-order statistics associated with statistical 
dependency. The related unresolved problems and deficien- 
cies are cleariy felt in the an and arc solved by this invention 
in the manner described below. 

^ SUMMARY OF THE INVENTION 

This invention solves fee above problem by introducing a 
new class of unsupervised learning procedures for a oeural 
network that solve the general blind signal processing prob- 

10 icm by maximizing joint input/ou^t entropy through gra- 
dient ascent to minimize mutual information in the outputs. 
The network of this invention aiiscs from the unexpectedly 
advantageous observation that a particular type of non-linear 
signal transform creates learning signals with the higher- 

1^ order statistics needed to separate unknown source signals 
by minimizing mutual information among neural network 
ou^ut signals. This invention also arises firom the second 
unexpectedly advantageous discovery that mutual informa- 
tion among neural network outputs can be minimized by 

^ maximizing joint output entropy when the learning trans- 
form is selected to match the signal probability distributions 
of interest 

The process of this invention can be appreciated as a 
generalization of the infomax princ^le to ooD-linear units 

^ witharbitrarily distributed inputs uncorrupted by any known 
noise sources. It is a feature of the system of this invention 
that each n>easured input signal is passed through a prede- 
termined sigmoid function to adaptively maximize informa- 
tion transfer by optimal alignment of the monotonic sigmoid 

^ slope with the input signal peak probability density. li is an 
advantage of this invention that redundancy is minimi /rid 
among a mult^Udty of outputs merely by maximizing total 
information throughput diereby producing die independent 
components needed to solve the blind separation problem. 

The foregoing, together with other objects, features and 
advantages of this invention, can be better appreciated with 
reference to the following specification^ claims and the 
accompanying drawing. 

^ BRIEF DESCRIPTION OF THE DRAWING 

For a more complete understanding of this invention, 
reference is now made to the following detailed description 
of the embodiments as illustrated in the accompanying 
45 drawing, wherein: 

FIGS. 1A» IB, IC and ID illustrate the feature of sig- 
moidal transfer function alignment for optimal Information 
flow in a sigmoidal neuron from the prior art; 

FIGS. 2A, 2B and 2C illustrate die blind source separation 
^ and blind deconvolution problems from the prior art; 

FIGS. 5A, 3B and 3C provide graphical diagrams illus- 
trating a joint entropy maximization example where maxi- 
mizing joint entropy fails to produce statistically indq;>cn- 
dent output signals because of improper selection of the 
non-linear transforming function: 

FIO. 4 shows the theoretical rcUdonship between the 
several entropies and mutual information from the prior art; 

FIG. 5 shows a functional block diagram of an illustrative 
^ embodiment of the source separation network of this inven- 
tion; 

FIG. 6 18 a functional blodc diagram of an illusurative 
embodiment of the blind deoorrelating network of this 
invention; 

65 FIO. 7 is a functional block diagram of an illustrative 
embodiment of the combined blind source separation and 
blind decorrelation network of this invention; 
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FIGS 8A SB and 8C show typical probability density RefcttiDg to HG. lA. when a single input x is passed 

f.ocdansfa;spccch.rockmusicaBdG.u^^ a ^^^^^^^^^^^^^ 

before and «f " penormca accoromg ^ function f^x) is aligocd with the steepest sloping portion of 

procedure of this invention aon-linear transfonning function g(x). Ttus is equivalent to 

FIG. 10 shows the rcsulU of a blind source separation aUgnment of a neuron input-output function to the 

experiment pcrfomicd using deprocedure of this invention; expected distribution of incoming signals that leads to 

and optimal information flow in sigrooidal neurons shown in 

FIGS. IIA, IIB, lie, UD, llE, IIF, IIG, IIH. IIL lU, figS. IC-ID. FIG. ID shows a zcio-mode distribution 

IIK and IIL show time domain filter charts iUustiating fte matched to the sigmoid function in FIG. IC. In FIG. lA, the 

results of the blind deconvolution of several differeDt cor- ^p^^ ^ having a probability distribution f/X) is passed 

rupted human speech signals according to the procedure of through the non-linear sigmoidal function g(x) to produce 

this invention, output signal y having a probability distribution f^y). The 

^ ^««^™^^^T r^T7 -w™ 15 information in ttie pcobability density function 1/y) varies 

DETAH^ DESCTH™ OFT^ responsive to the aUgnment of the mean and va5^ 

PREFERRED EMBODIMEOTS wX^Sect to the to^shold w. and slope w of g(x). When 

This invention arises from die unexpectedly advantageous g(x) is monotonicaily increasing or decreasing (thereby 

obsoration that a class of unsupervised leatning rules for having a unique inverse), die ou^ut signal probability 

maximizing information transfer in a neural netwoik solves 20 density function f/y) can be written as a function of the 

the blind signal processing problem by minimizing redun- input signal probability density flinction f^x) as foUows: 
dancy in the netwoifc ou^uts. This class of new learning 

rules is now described in infcnnation theoretic terms, first Ux) IB^p- ^1 

for a single input and then for a mult^lidty of unknown w)- 

input signals. 25 | ai | 

Information Maximization Fdr a Single Source 

In a single4nput network, die mutual information that the where 11 denotes absolute value, 

output y of rnctwork contains about its input x can be Eqn. 3 leads to die unexpected discov«y of an advanta- 

expressed as* %toti^ gradient descent process because die output signal 

^ 30 entropy can be expressed in terms of the ou^ut signal 

Ky^xy^iyy-HiyU) 1) probatulity density function as follows: 
where H(y) is the entropy of the oatput signal, H(ylx) is diat 

portion of the output signal entropy that did not come firom r -h« 

the input signal and Uy^) is the mutual information. Eqn. 1 «(y) « -Bim.^)] = - J ^^^fyiy)^ 

can be apiredated with reference to FIG. 4, which illustrates 33 

the well-known relationship between input signal entropy where B[ ] denotes expected value. Substimting Eqn. 3 into 

H(x), output signal enlzopy H(y) and mutual information Eqn^ 4 produces the following: 

I(y^). 

When ttiere is no noise or when die noise is treated as F inl 11 >£lto/j^)l ^ 

merely anodier unknown input signal, toe moping between 40 - |^ | -3^ | j 

input X and output y is detmmnistic ^^^^^^ rtic second term on the right side of Eqn. 5 is simply the 

H(ylx) has its lowest possible value, ^l^^'^ '^''^^ ^^^wn input signal enlropy H(x),^Uich cannot be 

infinity. This divergence is a consequence of ""^^^ ^ I J the parameter w that defines 

zation of information ^^^^ ^J*.^^^'^^"^"^^^ ^^Unc^^^^ 

ables. The output entropy H(y) is really die differential 45 ^ maximized to maximize the 

entropy of ou^t signal y mth r«P<f J^^^^f ^u rig^ e^^^^ This first term is the av«age 

such as toe noise level or toe ^^^^J^^^^J^ ffid^i^toe eff^ of input signal x on output signal y 

representation of toe vambles in 'L?'^ y ^d may be maximized bylonsidcring toe input signals as 

complexitiescanbeavoidcdbyrestnctingtoenrtwarktotoc ^l^J^^ jcnsitv fjfx> and daivinfi an online, 

co^dcratioo of toe ^^radient of inf<«nadon th^^^^^ 50 J^^/^^e'^S^^^ as: 

titles wito respect to some parameter w. Such gradients arc siwwu»»«v » — r 

as well-behaved as are discrete-variable entropies because ^, 

toe reference terms involved in toe definition of differential a f | gy \\^( _gy a / ^ \ 

entropies disappear. In particular, Eqn. 1 can be different!- ^'^"'"3;; 7^ \^\'^\)''\^^ J y'dT ) 

ated to obtain toe correspondlag gradients as foUows: 55 ^ ^ 

^ 3 IBqn. 21 parameter w to adjust the log of toe slope of sigmoid 

tM='S;r function. Any sigmoid function can be used to specify 

„, . . . ^ , . measure Aw, such as toe widely-used logistic transfer fkinc- 

bccausc, in toe noiseless case, H(yix) docs not dq)end on w ^ ^ 

and its differential disappears. Thus, for continuous deter- 60 

ministic matchings, toe mutual information between net- ^^^^^^ ^j^^ mx^f^ lEqn. 7] 

WOTk input and network output can be maximized by maxi- ^ ...^ . 

mizinc toe gradient of toe entropy of toe output alone, which in which toe input x is first aUgned wito toe signM>id ^ction 

is an uneM)cctedly advantageous consequence of treating through mult^Ucation by a scaling weight w and addition of 

noise as anotoer unknown source signal This permits toe 65 a bias weight Wq to create an aligned agnal u. wluch is toen 

discussion to continue witoout knowledge of toe input signal non-Uneariy transformed by toe logistic transfer ftincdon to 

^tics. create signal y. Anotoer useful sigmoid fiinction is toe 
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hyperbolic tangent ftinction expressed as y=taiih(u). The 
hypaix)ijc tangeot functioD is a member of the general class 
of funcdons g(x) each representiag a solution to the partial 
differential equation. 



.8) 



with aboundaiy condition of g(0)=0. The parameter r should 
be selected iq>propriately for the assumed kurtosis of the 
input probability distcibution. For lairtosis above 3. cither 
the hyperbolic tangent function (r»2) or the non-member 
logistic transfer function is well suited f<x the process of this 
invention. 

For tiie logistic transfer function (Eqn. 7), the terras in 
Eqn. 6 can be expressed as: 

{Eqn. 9] 
[Eqn. 10] 



Dividing Eqn. 10 by Eqn. 9 produces a scaling measure 
Aw for the scjEdlng wd^t learning rale of this invention 
based on the logistic function: 



where e>0 is a learning rate. 

Similar reasoning leads to a bias measure AWq for the bias 
weight learning rale of this invention based on the logistic 
tnmsfer function, expressed as: 



12 



If the hypertx}lic tangent sigmoid function is used, the 
bias measure Aw^^ then becomes proportional to -2y and the 
scaling measure Aw becomes proportional to ~2xy-fw~^, 
such that AWoP-2y6 and Aw=€<-2xy4-w^*), where e is the 

s learning rate. These learning rules offer the same general 
features and advantages of the learning rules discussed 
above in connection witii Eqns. 10-11 for the logistic 
transfer function. In general, any sigmoid ftincdon in the 
class of sohitions to Eqn. 8 selected for parametric suitafaUity 

10 to a particular input probability distribution can be used in 
accordance witii the process of Ifais invention to solve the 
blind signal processing problem. These unexpectedly advan- 
tageous learning rules can be generalized to tite multi* 
dimensional case. 

IS Joint Entropy Maximization for Multiple Sources 

To appreciate the multiple-signal bhnd processing method 
of this invention, consider the general network diagram 
shown in PIG. 2A where the measured input signal vector 
[X] is transformed by way of flie weight matrix [W] to 

20 produce a monotonically transformed output vector lYi=g( 
[W)fX)+[Wol)* By analogy to Eqn. 3, the multivariate 
probability density function of [Y] can be expressed as 



tBqn. 11) 
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where Ul is the absolute value of the Jacoblan of the 
transformation that produces output vector [Y] from input 
vector [XI. As is well-known in the art, die Jacoblan is the 
detenninant of the matrix of partial derivatives: 



[Bqa- 12) 



These two learning rules (Eqns. 11-12) are implemented 
by adjusting the respective w or w© at a "learning rate** (<), 35 
whidi is usually less than one percent (e<0,01), as is known 
in the neural network arts. Refeiring to FIGS. lA-lC, if the 
input probability density function fJiX) is Gaussian, then die 
bias measure Awo operates to ali^ the steepest part of the 
sigmoid curve g(x) with the peak x of f^x), thereby match- 40 
log input density to output slope in the manner suggested 
intuitively by Eqn. 3. The scaling measure Aw operates to 
align die edges of the sigmoid curve slc^ to the particular 
width (proportional to variance) of f^x). Thus, narrow 
probabiUty density functic»s lead to sharply-sloping sig- 45 
moid functions. 

The scaling measure of Eqn. 1 1 defines an **anti-Hebbian" 
learning nile with a second "anti-decay** term. The first 
anti^Hebbian term prevents the uninformativc solutions 
where ou^ut signal y saturates at 0 or 1 but sudi an 5o 
unassisted anti-Hebbian rule alone allows the slope w to 
disappear at zero. The second anti-decay term (1/w) forces 
output signal y away firom the other uninfonnative situation 
where slope w is so flat that output signal y stabilizes at 0.5 
(FIG. lA). 53 

The effect of these two balanced effects is to produce an 
output probability density function f/y) that is close to the 
fiat unit distribution function, which is known to be the 
maximum entropy distribution for a random variable 
bounded between 0 and 1. FIG. IB shows a family of 60 
sigmoid output distributions^ with the most informative one 
occurring at sigmc^d slope w^ Using the logistic transfer 
function as the non-linear sigmoid transfocmation, the learn- 
ing rule in Eqn. 11 eventually brings the slope w to w^ 
thereby TnaTgtniying entropy in ou^ut signal y. The bias rule 65 
in Eqn. 12 centers die mode In die sloping region at Wq 
(FIG. lA). 



[EdD. 14] 



where det[0 denotes the determinant of a square matrix. 

By analogy to the single«input case discussed above^ the 
method of this invention maximizes the natural log of the 
Jacoblan to maximize output entropy H(Y) for a given input 
entropy H(X)« as can be appreciated with reference to Eqn. 
5. The quantity InlJI represents the volume of space in [Y] 
into which points in [X] arc mapped. Maximizing this 
quantity attempts to spread the training set of input points 
evenly [YJ. 

For the oommooly-used logistic transfer function*, the 
resulting learning rules can be proven to be as follows: 

{Aw^^my-nYim^Miwfr') iBqo. is) 

(AWo>=i<U)-2in) [B^ 16) 

In Eqn. 15, the first anti-Hebbian term has become an 
outer product of vectors and the second anti-decay term has 
generalized to an '^anti-redundancy" term in the form of the 
inverse of the transpose of the weight matrix | W). Eqn. 15 
can be written, for an individual weight as follows: 



/ coJ[Wu\ „ \ 



(Eqo. 17J 



where coflWy] denotes the cofactcn: of element which 
is known to be (-1)'^^ times the detenninant of the matrix 
obtained by removing the i*^ row and the j*^ column firom the 
square weight matrix [W] and e is the learning rate. 
Similarly, the i*^ bias measure AW|o can be esqiressed as 
follows: 
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lEqn. 181 



The rules shown in Eqns, 17-18 arc the same as those for 
the single unit mapping (Eqns. 11-12) except that the 
instabiUty occurs at dct[Wl-0 instead of w=0. Thus, any 
degenerate weight matrix leads to instability because any 
weight matrix having a zero determinant is degenerate. This 
fact enables different outputs Y, to learn to represent differ- 
ent things about the inputs X^. When the weight vectors 
entering two different outputs become too similar, dct[Wl 
becomes small and the natural Icaming process forces ttiese 
approaching weight vectors apart. This effect is mediated by 
the numcxator coftW^], which approaches zero to indicate 
degeneracy in the weight matrix of the rest of the layer not 
associated with input Xy or output Y^. 

Other sigmoidal transformations yield other training rules 
mat are similarly advantageous as discussed above in con- 
nection with Eqn. 8. For instance, the hyperboUc tangent 
function yields lules very similar to those of Eqns. 17-lS. 

[Bp. 19] 



the Jacobian of the Eqn, 22 transforraation according to Eqn. 
13. The ensemble can be **CTeatcd- from a single time series 
by leaking the scries into sequences of length I, which 
reduces [W] In Eqn. 23 to an W lower triangular matrix. The 
Jacobian of the transformation is then written as follows: 



10 



15 



20 



which may be decomposed into the determinant of the 
weight matrix [W) ci Eqn. 23 and the product of the slopes 
of the sigmoidal squashing function for all times t Because 
[W] is lower-trianguJax, its determinant is merely the product 
of the diagonal values, which is W/. As before, the output 
signal entropy H(Y) is maximized by maximizing the loga- 
rithm of the Jacobian. which may be written as: 

lEqn. 25] 
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If the hypobolic tangent is selected as the non-linear 
sigmoid function, then differentiation with respect to die 
filter weights W(t) provides the following two simple learn- 
ing rales: 

[Eqp. 26] 
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The usefulness of fihesc blind source separation network 
leamiug rules can be appreciated with reference to the 
discussion below in connection with FIQ. 5. 
Blind Deconvolution in a Causal Filter 

FIOS. 2B-2C illustrate the blind deconvolution problem. 
FIG. 2C shows an unobserved data sequence S(t) entering an 
unknown channel A(t), which rcsponsively produces the 
measured signal X(t) that can be blindly equalized through 
a causal fUtcr W(t) to produce an output signal U(t) ^)proxi- 
mating the original unobserved data sequence S(t). FIG, 2B 
shows the time series X(t), which is presumed to have a 
length of J samples (not shown). X(t) is convolved with a 
causal filter having I weighted taps. . . . , W, and impulse 
response W(t). The causal filter output signal U(t) is then 
passed throu^ a noo-linear sigmoid function g(') to create ^ 
the training signal Y(t) (not shown). This system can tje 
expressed either as a convolution (Eqn. 21) or as a matrix 
equation (Eqn. 22) as follows: 



Am = * • i. {--TXi^ilt}), whew i > 1 



(Bqn. 271 



r(0=«(WCO*X(i)) 



[Bp. 211 



In Eqns. 26-27, W; is the **leading weight" and 

WXi**2 I) r<5>rescnt the remaining weights in a delay 

line having I weighted taps linking the input signal sample 
X,^^A to the output signal sanq>le Yy. The leading weight W, 
therefore adapts like a weight connected to a neuron with 
35 only that one input (Eqn. 11 above). The otho" tap weights 
{W J attempt to decocrelate the past input from Ac present 
ou^ut Thus, the leading weight keqps the causal filter 
from **shrinking". ^ . 

Other sigmoidal functions may be used to generate simi- 
larly useful learning rules, as discussed above in connection 
with Eqn. 8. The equivalent nales for the logistic transfer 
function discussed above can be easily deduced to be: 
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in which [Yl=g([Ul) and [X] are signal san^lc vectors 
having J samples. Of course, the vector ordering need not be 
temporal. For causal filtering, [W] is a banded lower trian- 
gular JxJ square matrix expressed as: 



AW, = €. J^X»*f.i(l- 
/«1 



i>i 
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Assuming an ensemble of time series, the joint probabihty 
distribution functions f,yi([Y]) and t^^fX^^ 



The usefulness of these causal filter learning rules can be 
appreciated witti reference to the discussion below in con- 
nection with FIGS. 6 and 7. 

Information Maximization v. Statistical Dependence 

The process of this invention relies on the unexpectedly 
advantageous observation that, under certain conditions, the 
maximization of the mutual information 1(YX) operates to 
minimize the mutual information between separate outputs 
{UJ in a multiple source network, thereby performing the 
redundancy reduction required to solve the Uind signal 
processing problem. The usefulness of this relationship was 
60 unsuspected until now. When limited to the usual logistic 
transfer or hyperbolic tangent sigmoid functions, this inven- 
tion appears to be limited to the general class of super- 
Gaussian signals having kurtosis greater than 3. This limi- 
tation can be understood by considering the following 
example shown in FIGS. 3A-3C 

Referring to FIG. 3A, consider a network with two 
outputs y I and Y,, which may be cither two output diannels 
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from a blind source scparadon network or two signal 
samples at difTercoC times for a blind deconvolucion networlL 
The joint entropy of these two variables can be written as: 



«(yi.X2>=«0'i>-«(yjHO'n>i) 



lBqa.30J 5 



Thus, the joint entropy can be maximized by maximizing 
the individual entropies while mtnimiring the mutual infor- 
mation ICyi^ya) shared between the two. When the mutual 
infoxnation liVifVi) is zero, the two variables y, and y, are 
statistically ind^ndent and the joint piobatnlity density 
function is equal to the product of the individual probability 
density functions so that f,4j^(yi-y2>=f>,,(yi)fy,(y:^- Both the 
ICA and the "whitening ^ approach to deconvolution are 
exan^des of pair-wise minimizatiOD of mutual Information 
Kyi*y2> for all yi This process is variously 

denominated factorial code learning, predictability 
minimization, independent component analysis ICA and 
redundancy reduction. 

The process of this Invention is a stochastic gradient 
ascent procedure that maximizes the joint entropy H(yi,y2)t 
thadby differing sharply from these **whitcning'' and ICA 
procedures known for minimizing mutual information I(yi. 
y^). The system of this invention rests on the unexpectedly 
advantageous discovery of the general conditions under 
whidi maximizing joint entropy operates to reduce mutual 
information (redundancy), thereby reducing the statistical 
dependence of the two outputs y^ and Yj. 

Under many conditions, maximizing joint entropy H(yp 
does not guarantee minimization of mutual information 
I(yj,yj) because of interference from the other single 
entropy terms H(yi) in Bqn. 30. FIG, 3C shows one patho- 
logical example where a "diagonal** projection of two 
independent, uniformly-distributed variables x^ and Xj is 
preferred over the **indcpcndent" projection shown in FIG. 
3B wheo joint entropy is maximized. This occurs because of 
a mismatch between the requisite alignment of input prob- 
ability distribution function and sigmoid slope discussed 
above in connection with FIGS. lA-lC and Bqn. 8. The 
learning procedure of this invention achieves ttie higher 
value of mutual entropy shown in FIG. 3C than the desired 
value shown in FIG. 3B because of the higher individual 
output entropy values H(y<) arising from the triangular 
probability distribution functions of (x^+Xj) and (x^-Xj) of 
FIG. 3C, which more closely matdi the sigmoid slope (not 
shown). Tliis interferes widi the minimization of mutual 
inf carnation ^y^yo) because the individual entropy H(yi) 
increases offset or mask undesired increases in mutual 
information to provide die hi^ier joint entropy HCyi^y^) 
sou^t by the process. 

The inventor believes that such interference has little 
significant effect in most practical situations, however. As 
mentioned above in connection widi Eqn. 8, the sigmoidal 
function is not limited to the usual two functicms and indeed 
can be tailored to the particular class of probability distri- 
bution functions expected by the process of this invention. 
Any function that is a member of the class of solutions to the 
partial differential Bqn. 8 provides a sigmoidal function 
suitable for use with the process of this invention. It can be 
shown diat this general class of sigmoidal frinctlons leads to 
the following two learning rules according to this invention: 
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+1 for >j>0 \ 
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and where parameter r is chosen appi'upi lately for the 
presumed kuitosis of the probability distribution function of 
the source signals [SJ. This fonnalism can be extended to 
covered skewed and mHlttmn4»' input distribution by 
extending Eqn. 8 to produce an increasingly complex poly- 
nomial hi g(x) such that 



Even with the usual logistic transfer function (Eqn. 7) and 
the hypert>olic tangent function (r=:2), it a|^>cars that the 
problem of individual entropy interference is limited to 
sub-Gaussian probability distribution functions having a 
^ kurtosis less than 3. Advantageously, many actual analog 
signals, including the speech signals used in the experimen- 
tal verification of the system of this invention, are super- 
Gaussian in distribution. They have longer tails and are more 
sharply peaked than the Gaussian distribution, as may be 
25 appreciated with reference to the three distribution functions 
shown in FIGS. «A-^. FIG. SA shows a typical speech 
probability distribution frinction, FIG. 8B shows the prob- 
ability distribution function fcr rock music and FIG. 8C 
shows a typical Gaussian white noise distribution. The 
30 Inventor has found that joint entropy maximization for 
sigmoidal networks always minimizes the mutual informa- 
tion between the network outputs for all super-Gaussian 
signal distributions tested. Special sigmoid functions can be 
selected that are suitable for accon^lishing the same result 
35 for sub-Gaussian signal distributions as weU^ although the 
precise learning rules must be selected in acccrdanoe with 
the parametric learning rules of Eqns. 31-32. 

Different sigmoid non-linearities provide different anti- 
Hebbian terms. Table 1 provides the anti-Hebbian terms 
40 from the learning rules resulting from several interesting 
non-linear transformation functions. The information- 
maximization rule consists of an anti-redundancy term 
which always has a form of ([W]^"^ and an anti-Hebbian 
term that keeps the unit from saturating. 
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TABLE 1 



Function; 



Slope: 



Aati Hebb ten&: 



yk««(u4) 

50 
55 

60 cff(^) 



Eqra. 8 



y»(i - yd 



(1 - ly/) 
!-(yir 



Mi-2y0 



EE9i.3t] 



{B9I.32] 



1 t-Ui* 



Tkble 1 shows that only Che Eqn. & sohitions (including 
the hypeibolic tangent function for x^) and the logistic 
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ttansfcr funcdoDS produce anti-Hebbian terms ftat can yield 
higher-order statistics. The other functioiJiS use the act input 
u, as the output variable rather using the actual transformed 
output y^. Tests pctf ocmed by the inventor show that die erf 
function Is unsuitable for blind sqwuration. In fact, stable 
wd^t matrices using the -2x/i, can be calculated firom the 
covariance matrix of the inputs alone. The learning rule for 
a Gaussian radial basis function node is interesting because 
it contains u^ in bodi die numoator and denominator. The 
denominator term limits the usefulness of such a nilc 
because data points near Oie radial basis fancdon center 
would cause instability. Radial transfer functions are gen- 
erally appropriate only when input distributions are annular. 
Illustrative Netwoilcs 

FIG. 5 shows a functional block diagram illustrating an 
cxerofAary embodiment of a four-port blind signal separa- 
tion network according to this invention. Each of the four 
input signals {X<} represents "sensor** output signals such as 
the electrical signal received from a microphone at a ''cock- 
tail party** or an antenna output signal Each of the four 
network output signals {UJ is related to the four input 
signals by weights so that [UJ=[W^)lXy]+(WJ. The four 
bias weights {W«} are updated regularly according tt> the 
learning rule of Eqn. 18 discussed above and each of the 
sixteen scaling weights {W^,} are updated regularly accord- 



i 



where g(-) denotes the selected sigmoidal transfer fiinctlon. 
If the hyperbolic tangent function is selected as the sigmoi- 
dal non-linearity, the following training rules are used in the 
system of this invention: 
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AW^f, = « • (- TX^^j-jy,) when i > I 



IBqD. 341 
[EqD. 351 

lead** plane and t is the 



where AW^j^ are the elements of the 
learning rate. 

15 In FIG. 7. each of the three input signals {X*) contain 
multipath distOTtion that requires blind deconvoJution as 
well as an unknown mixture of up to three unknown source 
signals {Sjt}. Each of the source separation planes, exem- 
plified by pUne 24, operates substantially as discussed 
20 above in connection with FIG. 5 for the three isput signals 
{Xjfc}, by providing three ou^ut contributions to the sum- 
ming elements exemplified by summing caicuit 26. Plane 24 
contains die lead wei^ts for die 16 Individual causal filters 
formed by die network. Prelximnary experiments performed 



^» B I. ^ «s»«.ii*.fi«^»aiv emarAtMl and deconvolved usms the leam- 



updates can occur after every signal sample or may be 
accumulated over many signal samples for updating in a 
global mode. Each of the weight elements in FIG. 5 exem- 
plified by element 18 includes the logic necessary to produce 
and accumulate the AW update according to the applicable 
learning rule. 

The separation network in FIG. 5 can also be used to 
remove interfering signals from a receive signal merely by, 
for exain|>le, isolating the interfeier as ou^t signal U| and 



simultaneously SQ>arated and deconvolved using the learn- 
ing rule discussed above resulted in recovery of ^sparendy 
ptffect ^ech. 
Experimental Results 
30 The inventor conducted experiments using three-second 
segments of speech recorded from various speakers with 
only one speaker per recording. All speech segments were 
sampled at 8.000 Hz from die ou^ut of the auxiliary 
microphone of a Sparc- 10 works totion. No special post- 



for cxanmlc, isolating me mtenercr as ouqwi signiu ui Miw - - . , * 

dien subtracting U, from die receive signal of interest, such 35 processing was pcrfcmned on the waveforms other than the 
. ^ \ ^_^«_!L^*j «■ /^f <ifl«miintriA£ A common interval r-331 to 



as receive signal X,. In such a configuration, the network 
shown in FIG. 5 is herein denominated a ^interference 
canoelllngf netw<xk. 

FIG. 6 shows a functional block diagram illustrating a 
simple causal filter operated according to die mediod of this 
invention for blind deconvolution. A time- varying signal is 
presented to the network at input 22. The five spaced t^s 
{T|} arc separated by a tlm^delay interval i in die manner 
well-known in the art far transversal filters. The five weight 



normalization of amplitudes to a common interval [-33J to 
pemit operation widi the equipment used. The network was 
trained using the stochastic gradient ascent procedure of this 
invention. 

40 Unsupervised learning in a neural network may proceed 
either continuously or in a global mode. Continuous learning 
consists in slightly modifying the weights after each propa- 
gation of an input vector Uirough the network. This kind of 
learning is useful for signals that arrive in real time or when 



WCU-iaiOWnin me an rar iraosvcr^ai iiivsia. Aiiv iirw TT^^s^". .^»^e> —^^ ^ r 

factc^ fW.V are estabUshed and updated by internal logic 45 local storage capacity is restricted. In a globa^ learning 
laciors iw / are csiauLiMEcu 1. . „^Z. 1-. » „..iW«Hm>i/ of «imnies m orooacatcd dirouch die 
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(not shown) according to the leaniing rules shown in Eqns 
26-27 discussed above. The five weighted signals {UJ 
are summed at a summation device 24 to produce die single 
timc-vaiying output signal U,. Because input signal X, 
includes an unknown non-linear combination of time- 
delayed versions of an unknown source signal Sj, the system 
of diis invention adjusts the tap weights {WJ such that 
output signal U/ iqjproximales the unknown source signal 

FIG. 7 shows a functional block diagram illustrating the 
combination of blind source separation networic and blind 55 
deconvolution filter systems of this invention. The blind 
separation learning rules and die Wind deconvolution rules 
discussed above can be easily combined in the form exem- 
plified by FIG. 7. The objective is to maximize die natural 
logarithm of a Jacobian with local lower triangular structure, 
whidi yields die expected learning rule that forces the 
leading wei^ts {W^^^} in die filters to foUow die blind 
separation rules and all others to foUow a dccocrclation rule 
cxcq>t diat tapped weights { W^} are interposed between a 
delayed input and an output. 

The outputs {Vj) are used to produce a set of training 
signals given by Eqn. 33: 



60 
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mode, a multiplicity of samples are propagated dirough die 
network and the results stored locally. Statistics are com- 
puted cxactiy on these data and die weights are modified 
only after accumulating and processing die multiplicity of 
signal samples. 

To reduce computational overhead, these experiments 
were performed using the global Learning mode, lb ensure 
that the ii^iut ensemble is stationary in time, random points 
were selected from die dirce-second window to generate die 
appropriate input vectors. Various learning rates were tested, 
widi 0.005 preferred. As used herein, learning rate e estab- 
lishes the actual weight adjustment such diat W^^W^ 
eAW^ as is known in die art The inventor found that 
reducing die learning rate over the learning process was 
useful. 

BHnd Separation Results: The network architecture 
shown in FIGS. 2A and 5 tQgcdicr widi die learning rules in 
Eqns. 17-18 were found to be sufficient to perform blind 
separation of at least seven unknown source signals. A 
random mixing matrix (A] was generated wifli values usu- 
aUy in Oie interval [-1,1]. The mixing matrix [A] was used 
to generate the several mixed time series [Xj] from die 
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criglnal sources J. The unmixing matrix (W] and the bias 
vector [WJ were then trained aooording to the rules in Eqns. 

17-18. 

FIG. 10 shows the results of fee attempted separation of 
five source signals. The mixtures [Xj] formed an incon^re- 
hcnsiWe babble thai could not be penetrated by the human 
ear. The umnixed solutions shown as (Y^l were obtained 
after presenting about 500*000 time samples, equivalent to 
20 passes through the ccnnplete three-second series. Any 
residual inteifeicnce in the ou^ vector elements [YJ is 
inaudible to the human ear. This can be appreciated with 
reference to the pennutation structure of the product of the 
final weight matrix [W] and the initial mixing matrix [A): 



-4^ 0.13 

OjOT -2.92 

0X)2 -0.02 

Oj02 Oj03 

-0.07 0.14 
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0j02 

-0.08 
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-0.01 
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As can be seen, the residual interference factors are only 
a few percent of the single substantial entry in each row and 
column* hereby demonstrating that weight matrix [W] 
substantially removes all effects of mixing matrix (A] from 
the signals. 

In a second expmmeat seven source signals, including 
five speaking voices, a rockimisic selection and white noise, 
were successfully separated, although the separation was 
stm slowly in^oving after 2.5 million iterations, equivalent 
to 100 passes through the three-second data. For two 
sources, convergenoe is normally achieved in less ttian one 
pass through the diree seconds of data by the system of this 
invention. 

The blind sq>anition procedure of this invention was 
found to fail only when: (a) more than one unknown source 
is Gaussian while noise, and (b) when the mixing matrix [A] 
is nearly singular. BoOi weaknesses are understandable 
because no pixxeduie can separate independent Gaussian 
sources and, if [A) is nearly singular, then any prc^ 



The first whitening example shows what happens when 
''deconvolving** a speech signal that has not been conupted 
(convolving filter [A] is a delta-function). If the tap spacing 
is close enough, as in this case where the tap spacing is 
5 identical to the sample internal, the process of this invention 
learns the wtiitening filter shown in FIG. UCthat flattens the 
amplitude spectrum of the speech up to the Nyquist limit 
(equivalent to half of the sampling frequency), FIG. 9A 
shows die spectrum of the speech sequence before decon- 
10 volution and FIG. 9B shows the speech spectrum after 
deconvolution by the filter shown in FIG. IIC Whitened 
speech sounds like a dear sharp version of the original 
signal because the phase structure is preserved. By using all 
available frequency levels equally, the system is maximizing 
IS information tfarougl^t in the cfaanneL Thus, when the 
original signal is not white, the deconvolving filter of this 
invention will recover a whitened version of it rather than 
the exact original However, when the filter taps arc spaced 
further iquut, as in FIGS. IIB-III, there is less opportunity 
20 for simide whitening. 

In the second "barrel-effect*' example shown In FIG. IIE, 
a 6.25 ms echo is added to the speecji signaL This creates a 
mild audible barrel effect Because filter HE is finite in 
length, its inverse is infinite in length but is shown in FIG. 
25 UF as truncated. The inverting filter learned in FIG- IIG 
resembles FIG. IIF although the resemblance tails off 
toward the left side because the process of this invention 
actually learns an optimal filter of finite length instead of a 
truncated infinite <^>timal filter. The resitlting deconvolution 
30 shown in FIG. IIH is very good. 

The best results from the bUnd deconvolution process of 
this invention arc seen when the ideal deconvolving filter is 
of finite length, as in the third example shown in FIGS. 
llI-ltL. FIG. lU shows a set of exponentiaUy-dccaying 
35 echoes spread out over 275 ms diat may be inverted by a 
two-point filter shown in FIG. lU with a small decaying 
correction on the left which is an artifact of the truncation 
of the convolving filter shown in FIG. IIL As seen in FIG. 
IIIC the learned filter cocre^nds almost exactly to the 

llLis 



the expression in Eqn. 17 quite unstable in the vlchiiQr of a 
solution. 

In contrast with these results, experience with similar tests 
of the HJ network shows it occasionally fails to converge for 
two sources and rarely converges for diree sources. 

Blind Deconvolution Results: Speech signals were con- 
volved with various filters and the learning rules in Eqns. 
26-27 were used to perform blind deconvolution. Some 
results are shown in FIGS, IIA-IIL. The convolving filter 
time domains shown in FIGS. IIA, llE and 111, contained 
some zero values. For example, RG. IIE represents the 
filter (0.8,0,0,0,11- Moreover, the taps were sometimes adja- 
cent to each other, as in FIGS. UA-UD, and sometimes 
spaced apart in time, as in FIGS. Ill-IIL- The leading 
weight of each filter is the right-most bar in each histogram, 
exemplified by bar 30 in FIG. Ill and l>ar 32 in FIG. IIG. 

A whitening experiment is shown in FIGS. IIA-IID, a 
barrel-effect experiment in FIGS. IIE-IIH and multiple- 
echo experiment in FIGS. Ill-IIL. For each of these three 
experiments, the time domain characteristics of convolving 60 
filter (A] is shown followed by those of the ideal decon- 
volving filter IWj^J, those of the filter produced by the 
process of this invention (W] and the time domain pattern 
produced by convolution of [WJ and [A]. IdeaUy. the con- 
volution tW]*(A) should be a delta-function consisting of 65 
only a single hi^ value at the right-most position of the 
leading weight when [W] correctly inverts [A]. 



almost perfect. This result demonstrates the sensitivity of the 
blind processing method of this invention in cases where the 
tap- spacing is great enough (100 sample intervals) that 
simple whitening cannot interfere noticeably with the decon- 
45 volution process, 

Qearly, other embodiments and modifications of this 
invention may occur readQy to those of cx-dinary skill in the 
art in view of these teachings. Therefore, this invention is to 
be limited only by the following claims, which include all 
so such embodiments and nx>difications when viewed in con- 
junction with the above spedficatioB and accompanying 
drawing. 
I dain^ 

I. A method performed in a neural network having input 
means for receiving a plurality J of input signals (Xy) and 
ou^ut means for producing a plurality I of output signals 
(U|) each said output signal U| representing a combination of 
said input signals (X^) wei^ted by a plurality I of bias 
weights (W^) and a plurality I^ of scaling weights (W^ such 
that (U|)=<Wy)(Xy>KW«), said method minimizing the 
information redundancy among said output signals (Uy), 
wherein 0<i^I>l and 0<j^J>l are integers, said metfiod 
coinprising: 

(a) selecting initial values for said bias weights (W«>) and 
said scaling weights (W^); 

(b) producing a plurality I of training signals (Y respon- 
sive to a transfonrution of said input signals (Xy) such 
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that Yf^giVi). wherein g(x) is a nonUncy 
the JacolMan of said transfonnadon is Jadcl(dY/dX^) 
when J=I; an<t 
(c) adjusting said bias weights (W^ and said scaling 
weights (Wy) responsive to one or more san^Ies of said 5 
training signals (Yg) sudi that each said bias weight 
Wl^ is changed proportionately to a corresponding bias 
measure AW,o accumulated over said one or more 
samples and each said scaling weight is changed 
proportionately to a corresponding scaling measure |o 
AW^-d(lnlJiy3Wj; accumulated over said one or 
more san^>les, wherein oO is a learning rate. 

2. The method of claim 1 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of the solutions to die equation 15 

and said AW^^^.^.^Kil'^^sgnW) accumulated over said one 20 
or more samples and each said scaling weight is changed 
proportionately to a corresponding scaling measure AW^ 
€ ((cof(W^,ydet(Wy)).rX^Ya'^* sgnTO) accumulated over 
said one or more samples. 

3. Tlic method of claim 1 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
cssenliaUy of gi(x>«tanh(x) and gjCxMl-*"*)'* ^ 
AW« selected from the group consisting essentially ctf 
AiW^=e (-2Y^ and ^^V/^^ H-^IY,) accumuUted over 
said one or more samples and cadi said scaling wdght W(, 
is changed prc3portionatcly to the a coircsponding scaling 
measure AWy selected from the group consisting essentiaUy 
of A,W4« ((cof(W<^ydct(W^)>-2XyY|) and AjW</=*.((cQf 
(W^,yde<W^>4-X/l-2Y()) accumulated over said one cr 
more samples. 

4. A neural-network implemented method for recovering 
one or mere of a plurality I of independent source signals 
(S() from a plurality J>I of sensor signals (Xj) each including 
a combhiation of at least some of said source signals (Si) 
wherein 0<i<f>l and 0<j$J>I are hitegers« said method 
comprising: 

(a) selecting a plurality I of bias weights (Wjo) and a 
plurality I* of scaling weights <W^^); 

(b) adjusting said Was weights (W|o) and said scaling 
weights (W^) by repeatedly performing the steps of: 
(b.l) producing a plurality I of estimation signals (U<) 

fcspoQsivc to said sensor signals (X^) such that 

(b^) producing a plurality I of training signals (Y^ 
renionsive lo a transfonnation of said sensor signals 
PO such that Yr«g(U^. wherein g(x) is a nonlinear 
function and the Jacobian of said transformation is 
J«:det(aY/3Xy) when J=I, and 

(b3) adjusting each said bias weight W^o and eadi said 
scaling weight W„ responsive to one or mote 
8an4>le5 of said training signals (Y^) such that said 
each bias weight W« is changed proportionately to a 
bias measure AW^ accumulated over said one or 
more samples and said each scaling wci^t W^,' is 
changed proportionately to a corresponding scaling 
measure AWy=c a(lnLri)/5Wy accumulated over said 
one or more samples, wherein €>0 is a learning rate; 
and 

(c) producing said estimation signals (U,) to represent said 

one or more recovered source signals (S^). 
5. The method of datm 4 wherein said nonlinear function 
g(x) is a nonlinear function selected from a grot^ consisting 
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essentially of the solutions to the equation 

and said AW«f=€ (-rX^Y^'^*sgn(Y,)) accumulated over said 
one or more samples and each said scaling weight Wy is 
changed proportionately to a corresponding scaUng measure 
AW.-e.((cof(W,^ydet(W<^))-rX/Y,r^ sgn(Y,)) accumu- 
latea over said one or more sai^les. 

6. The method of claim 4 wherein said nonlinear frinction 
c(x> is a nonlinear frinction selected from a group consisting 
essentiaUy of gi(x>^tanh(x) and g^xMl-c'^'^ ^ *"<^ 
adjusting comprises: 

(c) adjusting said bias weights (W„) and said scaling 
weights (W^) responsive to one or more samples of said 
training signals (Y,) such that each said bias weight 
is changed proportionately to a corresponding bias 
measure AW^ selected from die group coosisting 
essentially of AiW^=t'(-2Y,) and A2WKr»^ (l-2Yi) 
accumulated over said one or more samples and each 
said scaling weight W^^ is changed proportionately to 
the a corresponding scaling measure AWy selected 
from the group consisting essentially of A,Wj/=A ((cof 
(W„ydet(W^))-2X^Y,) and A2Wy=e((cof(W,p/dct 
(Wy)>+-X/1-2Y^)) accumulated over said one or more 
samples. 

7. A method in^ilemented in a transversal filter havmg an 
input for receiving a sensor signal X that includes a com- 
bination of multipath rcvcrtjcrations of a source signal S and 
having a plurality I of delay line tap output signals (T,) 
distributed at intervals of one or more time delays x, said 
source signal S and said sensor signal X varying with time 
ova- a plurality JSI <rf said time delay intervals x such that 
said sensor signal X has a value X, at time iQ-l) and each 
said delay line tap output signal has a value X^j^ 
representing said sensor signal value X^ delayed by a time 
interval x(i-l), wherein t>0 is a predetermined constant and 
0<i^I>l and 0<j^J^I arc integers, said method recovering 
said source signal S from said sensor signal X and compris- 
ing: 

(a) selecting a plurality I of filter weights (W,); 

(b) adjusting said filter weights (W<) by repeatedly per- 
forming the steps of 

(b. I) producing a plurality K=I of weighted output 
signals (V^ by combining said delay line Up output 
signals (T,) such that (VJ^F*,) (Jt). wherein 
0<k^K=I>l arc integers, and wherein Fj^^Wj^^j^ 
when l^k+l-iSI and Fjtf=0 otherwise, 
(b.2) summing a plurality K=I cf said weighted tap 
signals (V^ to pioduce an estimation signal 

K 
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Wherein said estimation signal U has a value Vj at time 

(b.3) i»oduclng a plurality J of training signals (Yj) 
responsive to a transformation of said sensor signal 
values QCj) such that Yy=g(Uj) wherein g(x) is a 
nonlinear function and the Jacobian of said transfor- 
mation is J=det(8Y^a3C) when J=I, and 

(b.4) adjusting cadi said niter weight W, responsive to 
one or more samples of said training signals (Y,) 
such that said eacii filter weight is changed 
(TOpoitionately to a corresponding leading measure 
AWiaccumulated over said one or more sanoples 
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when i=l aod a corresponding scaling measure 
e d(liilJI)/dW^ accumulated over said one or more 
sainples odierwise; and 
(c) producing said estimation signal U to represent said 

recovered source signal S. 
8. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of gi(x>=tanh(x) and g2(X>=<l~c'^"* and said 
AW| selected from the group consisting essentially of 



■2Ki>) 



accumulated over said one or more samples when i^l and a 
corresponding scaling measure AW^ selected from the group 
consisting essentially of 
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accumulated over said one or more samples otherwise. 

9. The method of claim 7 wherein said nonlinear function 
g(x) is a Qonlinear function selected from a group consisting 
essentially of the solutions to the equation 



accumulated over said one or more sanqiles when 1=1 and a 
cocresponding scaling measure 
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accumulated over said one or more samples odierwise. 

10. A neural network for recovering a plurality of source 
signals from a plurality of ndxtores of said source signals, 
said neural network comprising: 
input means for receiving a plurality J of input signals (XJ) 
each including a combination of at least some of a 
plurality I of independent source signals (Si), wherein 
0<igl>l and 0<j^ J^I arc integers; 
weight means coupled to said input means for storing a 
plurality I of bias weights (W|o) and a plurality of 
scaling weights (W^y); 
output means coupled to said weight means for producing 
a plurality I o( output signals (U^) responsive to said 
input signals (Xj) such that (U,>=(W<^) (Xy>KW„); 
training means coupled to said output means for produc- 
ing a plurality I of training signals (Y^ responsive to a 
transformation of said input signals (Xy) such that 

wherein g(x) Is a nonlinear function and the Jacoblan of said 

transfOTmation is J«det(dY/dXy) when J^I; 

adjusting means coupled to said training means and said 
weight means for adjusting said bias weights (W«,) and 
said scaUng weights (W^) responsive to one or more 
samples of said training signals (Y such that each said 
bias weight W^o is changed proportionately to a corre- 
sponding bias measure AW,o accumulated over said 
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one or more sanqiles and each said scaling weight W,y 
is changed proportionately to a corresponding scaling 
measure AW^-d(lnUI)/dWy accumulated over said 
one or more samples, wherein e>0 is a learning rate. 
11, The neural network of claim 10 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of ^e solutions to the ei]uation 
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and said Was measure AW,tf=c-(-riY|l'"*sgn(Yi)) and said 
scaling measure AW,/=€<(cof(W^;ydct(W^^))-rXy Yil*^^ sgn 

12. The neural network of claim 10 wherein said nonlin- 
ear function g(x) is a nonlinear function selected from a 
group consisting essentially of gt(x>=tanh(x) and g2(x>=<l- 
e"*)** and said bias measure AW,q is selected from a group 
consisting essentially of AjW^j^-ZY, and A2W^=1-2Y| and 
said scaling measure AW^ is selected frx>m a group consist- 
ing essentially of AWiW^=(cof(Wyydct(W^)>-^Yi and 
A2W.y=(cof( W<y)/det(Wy))+X/ 1-2Y,), 

13. A system for adaptively canceUiog one or more 
interfercr signals (S J comprising: 

input means for receiving a plurality J of input signals (Xj) 
each including a combination of at least some of a 
plurality I of independent source signals (S^) that 
includes said one or more interfercr signals (S^, 
wherein 0<i^I>l, (Kj^JS and 0<ngN^l are inte- 
gers; 

weight means coupled to said Input means for storing a 
plurality I of bias weights (W^) and a plurality of 
scaling weights (W^^); 

output means coupled to said weight means for producing 
a plurality I of output signals (U^) responsive to said 
input signals (Kj) sudi that (U.KW,^.) (X,)4<W«); 

training means coupled to said ou^t means for produc- 
ing a plurality I of training signals (Y^) responsive to a 
transformation of said input signals (KJ) such that 
Yr=g(U|). wherein g(x) is a nonlinear fonction and the 
Jacobian of said transformation is J>=det(dY/dX/); 

adjusting means coupled to said traiiUng means and said 
weight means for adjusting said bias weights (Wjo) and 
said scaling weights (W^) responsive to one or more 
saii^}lcs of said trainiDg signals (Y^) sudi that each said 
bias weight W^o is changed proportionately to a corre- 
sponding bias measure AW^ accumulated over said 
one or more samples and each said scaling weight 
is changed proportionately to a corresponding scaling 
measure AWj^-d(lnlJIVdW^ accumulated over said 
one or more samples, wherein oO is a learning rate; 
and 

feedback means coupled to said ouq)ut means and said 
ii^ut means for selecting one or more said output 
signals (UJ rqmsenting said one or mcM-e interferer 
signals (SJ for combination with said input signals 
(Xy), thereby cancelling said interfercr signals (S«). 

14. The system of claim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 

and said bias measure AWKr=c-(-r1Y^'"*sgn(Yi)) and said 
scaling measure AW^.((cof(W,y)/dct(Wy)KXylY^"'^ sgn 
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15. The system of claim 13 wherein said noolincar 
functioD g(x) is a oonlincar function sdected from a grou^ 
consisting cssentiaUy of g,(x)=tanh(x) and g^xMl-c"^ 
and said bias measure AW^o is selected from a group 
consisting cssentiaUy of AiWK,=-2Y, and A^W^^l-lY, and 



26 



said scaling measure AW^, is selected from a group consist- 
ing cssentiaUy of AjW;/Kcof(W^)/det{Wy)>-X^Y, and 
A2WyP=(cof(W^ydet(W^)HX/ 1-2 Y,). 
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WHAT IS CLAIMED IS: 

1 . A medical system for separating electrocardiogram (EKG) signals, comprising: 

a receiving module configured to receive a plurality J of recorded EKG signals Xj 
from a plurality of EKG sensors; 
5 a computing module configured to separate the received signals using independent 

component analysis to produce a plurality I of separated signals Yi; and 

a display module configured to display the separated signals. 

2. The medical system of claim 1, wherein the display module is further configured to display 
at least a portion of the separated signals in a chaos phase space portrait. 

10 3. The medical system of claim 2, wherein the separated signals include three components of 
QRS complex, and wherein the display module is further configured to display at least tiie three 
QRS complex components in a chaos phase space portrait. 

4. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals by multiplying the recorded signals by a matrix Wy such that Yi = Wy * Xj. 
15 5. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals using a neural-network implemented method, the method comprising: 

selecting a plurality I of bias weights Wio and a plurality I* J of scaling weights Wy-; 
adjusting the bias weights Wio and the scaling weights Wy to minimize information 
redundancy among separated signals; and 
20 producing separated signals Yi such that Yj = Wij * Xj + Wjo. 

6. The medical system of claim 1, further comprising a database storing a plurality of EKG 
signal triggers and corresponding diagnosis, and a matching module configured to match the 
separated signals with one or more of the stored EKG signal triggers. 

7. A computer-implemented method of separating electrocardiogram (EKG) recording signals, 
25 the method comprising: 

receiving a first plurality of EKG recording signals from EKG sensors placed on a 

patient; 

separating the first plurality of EKG recording signals using independent 
component analysis to produce a second plurality of separated signals; and 
30 displaying the separated signals, 

8. The method of claim 7, further comprising displaying at least a portion of the separated 
signals in a chaos phase space portrait. 

9. The method of claim 7, wherein the patient is a pregnant patient, and wherein the separated 
signals include separated signals originating from the pregnant patient and separated signals 

35 originating from a fetus. 

10. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of arrhj^hmia in the patient. 
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11. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likeUhood of myocardial infarction in the patient. 

12. The method of claim 7, wherein each of the separated signals corresponds to a location on 
the patient body, wherein the displayed separated signals are used by a physician to determine the 

5 location of an abnormal heart condition in the patient according to the separated signals' 
corresponding locations. 

13. A computer-assisted method of detecting arrh3^hmia in a patient, the method comprising: 

placing a first plurality of EKG sensors on a patient to produce a first plurality of 
channels of recorded EKG signals; 
10 sending the recorded signals to a computing module to separate the first plurality of 

EKG recorded signals into a first plurality of channels of separated signals using 
independent component analysis; and 

reviewing a display of the separated signals to determine the existence of 
arrhythmia in the patient. 

15 14. The method of claim 13, wherein reviewing a display of the separated signals comprises 
identifying a second set of one or more channels of separated signals that indicate arrhythmia, the 
method further comprising determining a probable location of arrhythmia according to the 
respective channel numbers of the second set of separated signals, 

15. The method of claim 13, wherein placing a first plurality of EKG sensors comprises placing 
20 a plurality of EKG sensors on more than 10 body surface locations of a patient's torso. 

16. The method of claim 13, wherein placing a first plurality of EKG sensors comprises placing 
a plurality of EKG sensors on more than 40 body surface locations of a patient's torso. 

17. A cardiac rhythm management system comprising: 

a cardiac signal recording module configured to record cardiac signals of a patient; 
25 a computing module configured to separate the recorded cardiac signals into 

separated signals using independent component analysis; 

a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a treatment module configured to treat the patient when the abnormal condition is 
3 0 detected or predicted . 

18. The cardiac rhythm management system of claim 17, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 

19. A cardiac rhythm management system comprising: 

35 a cardiac signal recording module configured to record cardiac signals of a patient; 

a computing module configured to separate the recorded cardiac signals into 
separated signals using independent component analysis; 

38 
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a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a warning module configured to issue a warning when the abnormal condition is 
detected or predicted. 

5 20. The cardiac rhythm management system of claim 19, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 
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